| |||||||
| Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics. |
![]() |
| | LinkBack | Thread Tools |
| | #1 (permalink) |
| Munky Designs | [PHP] file_get_contents and regex Hey, I have this terrible feeling i'm missing something really simple here. I'm trying to get my head around file_get_contents, combined with regex, to get specific bits of data from web pages. Lets say I want the about my part on my plurk page: http://www.plurk.com/user/Toddish the source code is: Code: <p id="about_me">
I'm a web designer from the UK who loves games, music, and films :)
<br>If you like any of my Plurks, feel free to add me!
</p>
PHP Code: any idea what I'm missing? cheers, rep etc as usual
__________________ |
| |
| | #2 (permalink) |
| NamePros Member | You're not escaping any characters in the regex match. The white space is also screws it up. If you really need that, there are workarounds, but (.*) excludes white space. Code: $data = str_replace("\n", '', file_get_contents('http://www.plurk.com/user/Toddish'));
echo preg_match("/\<p id\=\"about_me\"\>(.*)<\/p\>/",$data,$match);
print_r($match);
Bruce P.S. I think this is the forum for code snippets you're sharing with others; it's parent forum (just "Programming") is for code help. Though I may be wrong, I'm fairly new |
| |
| | #3 (permalink) |
| Munky Designs | cheers, didn't realise you had to escape all of them as well. Now I am getting all the text after it aswell. I tried to limit it via: PHP Code: cheers so far, rep added ![]() oh, and it does look as though I mis clicked on the wrong forum, if a mod cold move it please
__________________ |
| |
| | #8 (permalink) |
| Munky Designs | sorry, another few small things ![]() im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character? but i'm guessing it references the </p> instead, just not sure how. Also, i need a .htaccess file to change some urls. So I have this: Code: RewriteEngine on RewriteRule ^plurk/([a-z]+)/$ /plurk.php?user=$1 but this doesn't seem to work, I get a 404 any ideas?
__________________ Last edited by Albino; 06-21-2008 at 12:11 PM. |
| |
| | #9 (permalink) |
| NamePros Member | I got this working as follows: Code: RewriteEngine on RewriteRule plurk/(.*) plurk.php?user=$1 If you want to throw a number of characters limit in, you can use this: Code: $data = file_get_contents('http://www.plurk.com/user/Toddish');
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match);
print_r($match);
Bruce |
| |
| | #10 (permalink) | |
| NamePros Regular | Quote:
After a normal character or expression is means one or zero occurrences of. After a normally "greedy" operator it makes it non-greedy. Greedy means that the operator will match as much as possible. * is normally greedy, so the if you just use .* it will match everything until the last </p> on the page. By making it non-greedy, it matches as little as it can. So it matches up to the next </p> on the page, i.e. up to the </p> at the end of the <p id="about_me"> paragraph. Last edited by qbert220; 06-23-2008 at 04:48 AM. | |
| |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| |