NameSilo

[PHP] file_get_contents and regex

Namecheap AuctionsNamecheap Auctions
Namecheap AuctionsNamecheap Auctions
SpaceshipSpaceship
Watch

Albino

Munky DesignsEstablished Member
Impact
17
Hey,

I have this terrible feeling i'm missing something really simple here.

I'm trying to get my head around file_get_contents, combined with regex, to get specific bits of data from web pages.

Lets say I want the about my part on my plurk page: http://www.plurk.com/user/Toddish

the source code is:

Code:
 <p id="about_me">
                    I'm a web designer from the UK who loves games, music, and films :)
<br>If you like any of my Plurks, feel free to add me!
                </p>

so, I tried this:

PHP:
<?php
$data = file_get_contents('http://www.plurk.com/user/Toddish');
$regex = '/<p id="about_me">[.*]<\/p>/';
preg_match($regex,$data,$match);
var_dump($match);
?>

but I get nothing.

any idea what I'm missing?

cheers, rep etc as usual :)
 
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
Unstoppable Domains โ€” AI StorefrontUnstoppable Domains โ€” AI Storefront
You're not escaping any characters in the regex match.
The white space is also screws it up. If you really need that, there are workarounds, but (.*) excludes white space.

Code:
$data = str_replace("\n", '', file_get_contents('http://www.plurk.com/user/Toddish'));
echo preg_match("/\<p id\=\"about_me\"\>(.*)<\/p\>/",$data,$match);
print_r($match);


Bruce

P.S. I think this is the forum for code snippets you're sharing with others; it's parent forum (just "Programming") is for code help. Though I may be wrong, I'm fairly new ;)
 
0
•••
cheers, didn't realise you had to escape all of them as well.

Now I am getting all the text after it aswell. I tried to limit it via:

PHP:
/\<p id\=\"about_me\"\>(.*){1,400}<\/p\>/

but I now get NULL.

cheers so far, rep added :)

oh, and it does look as though I mis clicked on the wrong forum, if a mod cold move it please :)
 
0
•••
You missed escaping the first tag on both p's and equals sign.
Try
Code:
/\<p id\=\"about_me\"\>(.{1,400})<\/p\>/


Bruce
 
0
•••
I can't even copy and paste properly, oh dear!

my actual code is:

PHP:
preg_match("/\<p id\=\"about_me\"\>(.*){1,1000}<\/p\>/",$data,$match);
 
0
•••
This will just match up until the first </p>, even across multiple lines:
PHP:
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match);
 
1
•••
perfect, thanks :)
 
0
•••
sorry, another few small things :(

im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character? but i'm guessing it references the </p> instead, just not sure how.

Also, i need a .htaccess file to change some urls. So I have this:

Code:
RewriteEngine on
RewriteRule ^plurk/([a-z]+)/$ /plurk.php?user=$1
to turn /todd/plurk.php?user=Toddish to /todd/plurk/Toddish

but this doesn't seem to work, I get a 404

any ideas?
 
Last edited:
0
•••
I got this working as follows:
Code:
RewriteEngine on
RewriteRule plurk/(.*) plurk.php?user=$1

I also tested Mikor's regex, and it does match across multiple lines.
If you want to throw a number of characters limit in, you can use this:
Code:
$data = file_get_contents('http://www.plurk.com/user/Toddish');
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match); 
print_r($match);


Bruce
 
0
•••
Albino said:
im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character?

The ? has 2 different meanings, depending on context.

After a normal character or expression is means one or zero occurrences of.

After a normally "greedy" operator it makes it non-greedy.

Greedy means that the operator will match as much as possible. * is normally greedy, so the if you just use .* it will match everything until the last </p> on the page. By making it non-greedy, it matches as little as it can. So it matches up to the next </p> on the page, i.e. up to the </p> at the end of the <p id="about_me"> paragraph.
 
Last edited:
1
•••
cheers, bruce,works fine :)

and thanks for the explanation qbert, that helped a lot!
 
0
•••
Appraise.net
Escrow.com
Spaceship
Domain Recover
CryptoExchange.com
Catchy
DomDB
NameFit
  • The sidebar remains visible by scrolling at a speed relative to the pageโ€™s height.
Back