Dynadot โ€” .com Registration $8.99

[PHP] file_get_contents and regex

Spaceship Spaceship
Watch

Albino

Munky DesignsEstablished Member
Impact
17
Hey,

I have this terrible feeling i'm missing something really simple here.

I'm trying to get my head around file_get_contents, combined with regex, to get specific bits of data from web pages.

Lets say I want the about my part on my plurk page: http://www.plurk.com/user/Toddish

the source code is:

Code:
 <p id="about_me">
                    I'm a web designer from the UK who loves games, music, and films :)
<br>If you like any of my Plurks, feel free to add me!
                </p>

so, I tried this:

PHP:
<?php
$data = file_get_contents('http://www.plurk.com/user/Toddish');
$regex = '/<p id="about_me">[.*]<\/p>/';
preg_match($regex,$data,$match);
var_dump($match);
?>

but I get nothing.

any idea what I'm missing?

cheers, rep etc as usual :)
 
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
GoDaddyGoDaddy
You're not escaping any characters in the regex match.
The white space is also screws it up. If you really need that, there are workarounds, but (.*) excludes white space.

Code:
$data = str_replace("\n", '', file_get_contents('http://www.plurk.com/user/Toddish'));
echo preg_match("/\<p id\=\"about_me\"\>(.*)<\/p\>/",$data,$match);
print_r($match);


Bruce

P.S. I think this is the forum for code snippets you're sharing with others; it's parent forum (just "Programming") is for code help. Though I may be wrong, I'm fairly new ;)
 
0
•••
cheers, didn't realise you had to escape all of them as well.

Now I am getting all the text after it aswell. I tried to limit it via:

PHP:
/\<p id\=\"about_me\"\>(.*){1,400}<\/p\>/

but I now get NULL.

cheers so far, rep added :)

oh, and it does look as though I mis clicked on the wrong forum, if a mod cold move it please :)
 
0
•••
You missed escaping the first tag on both p's and equals sign.
Try
Code:
/\<p id\=\"about_me\"\>(.{1,400})<\/p\>/


Bruce
 
0
•••
I can't even copy and paste properly, oh dear!

my actual code is:

PHP:
preg_match("/\<p id\=\"about_me\"\>(.*){1,1000}<\/p\>/",$data,$match);
 
0
•••
This will just match up until the first </p>, even across multiple lines:
PHP:
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match);
 
1
•••
perfect, thanks :)
 
0
•••
sorry, another few small things :(

im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character? but i'm guessing it references the </p> instead, just not sure how.

Also, i need a .htaccess file to change some urls. So I have this:

Code:
RewriteEngine on
RewriteRule ^plurk/([a-z]+)/$ /plurk.php?user=$1
to turn /todd/plurk.php?user=Toddish to /todd/plurk/Toddish

but this doesn't seem to work, I get a 404

any ideas?
 
Last edited:
0
•••
I got this working as follows:
Code:
RewriteEngine on
RewriteRule plurk/(.*) plurk.php?user=$1

I also tested Mikor's regex, and it does match across multiple lines.
If you want to throw a number of characters limit in, you can use this:
Code:
$data = file_get_contents('http://www.plurk.com/user/Toddish');
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match); 
print_r($match);


Bruce
 
0
•••
Albino said:
im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character?

The ? has 2 different meanings, depending on context.

After a normal character or expression is means one or zero occurrences of.

After a normally "greedy" operator it makes it non-greedy.

Greedy means that the operator will match as much as possible. * is normally greedy, so the if you just use .* it will match everything until the last </p> on the page. By making it non-greedy, it matches as little as it can. So it matches up to the next </p> on the page, i.e. up to the </p> at the end of the <p id="about_me"> paragraph.
 
Last edited:
1
•••
cheers, bruce,works fine :)

and thanks for the explanation qbert, that helped a lot!
 
0
•••
Appraise.net
Unstoppable Domains
Domain Recover
DomainEasy โ€” Zero Commission
  • The sidebar remains visible by scrolling at a speed relative to the pageโ€™s height.
Back