[advanced search]
Results from the most recent live auction are here.
23 members in the live chat room. Join Chat!
Register Rules & FAQ NP$ Store Active Threads Mark Forums Read
Go Back   NamePros.Com > Design and Development > Programming
User Name
Password

Old 06-20-2008, 10:25 AM   · #1
Albino
Munky Designs
 
Trader Rating: (11)
Join Date: May 2005
Posts: 984
NP$: 406.00 (Donate)
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
[PHP] file_get_contents and regex

Hey,

I have this terrible feeling i'm missing something really simple here.

I'm trying to get my head around file_get_contents, combined with regex, to get specific bits of data from web pages.

Lets say I want the about my part on my plurk page: http://www.plurk.com/user/Toddish

the source code is:

Code:
<p id="about_me"> I'm a web designer from the UK who loves games, music, and films :) <br>If you like any of my Plurks, feel free to add me! </p>


so, I tried this:

PHP Code:
<?php
$data
= file_get_contents('http://www.plurk.com/user/Toddish');
$regex = '/<p id="about_me">[.*]<\/p>/';
preg_match($regex,$data,$match);
var_dump($match);
?>


but I get nothing.

any idea what I'm missing?

cheers, rep etc as usual


Please register or log-in into NamePros to hide ads
Albino is offline   Reply With Quote
Old 06-20-2008, 11:01 AM   · #2
Bruce_KD
NamePros Member
 
Trader Rating: (1)
Join Date: Sep 2006
Posts: 76
NP$: 100.00 (Donate)
Bruce_KD will become famous soon enoughBruce_KD will become famous soon enough
You're not escaping any characters in the regex match.
The white space is also screws it up. If you really need that, there are workarounds, but (.*) excludes white space.

Code:
$data = str_replace("\n", '', file_get_contents('http://www.plurk.com/user/Toddish')); echo preg_match("/\<p id\=\"about_me\"\>(.*)<\/p\>/",$data,$match); print_r($match);



Bruce

P.S. I think this is the forum for code snippets you're sharing with others; it's parent forum (just "Programming") is for code help. Though I may be wrong, I'm fairly new
Bruce_KD is offline   Reply With Quote
Old 06-20-2008, 11:29 AM   · #3
Albino
Munky Designs
 
Trader Rating: (11)
Join Date: May 2005
Posts: 984
NP$: 406.00 (Donate)
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
cheers, didn't realise you had to escape all of them as well.

Now I am getting all the text after it aswell. I tried to limit it via:

PHP Code:
/<p id="about_me\"\>(.*){1,400}<\/p\>/


but I now get NULL.

cheers so far, rep added

oh, and it does look as though I mis clicked on the wrong forum, if a mod cold move it please
Albino is offline   Reply With Quote
Old 06-20-2008, 12:25 PM   · #4
Bruce_KD
NamePros Member
 
Trader Rating: (1)
Join Date: Sep 2006
Posts: 76
NP$: 100.00 (Donate)
Bruce_KD will become famous soon enoughBruce_KD will become famous soon enough
You missed escaping the first tag on both p's and equals sign.
Try
Code:
/\<p id\=\"about_me\"\>(.{1,400})<\/p\>/



Bruce
Bruce_KD is offline   Reply With Quote
Old 06-21-2008, 02:22 AM   · #5
Albino
Munky Designs
 
Trader Rating: (11)
Join Date: May 2005
Posts: 984
NP$: 406.00 (Donate)
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
I can't even copy and paste properly, oh dear!

my actual code is:

PHP Code:
preg_match("/\<p id\=\"about_me\"\>(.*){1,1000}<\/p\>/",$data,$match);
Albino is offline   Reply With Quote
Old 06-21-2008, 03:06 AM   · #6
Barrucadu
Formally Mikor.
 
Barrucadu's Avatar
 
Name: Michael Walker
Location: East Yorkshire, England
Trader Rating: (7)
Join Date: Aug 2005
Posts: 2,539
NP$: 164.25 (Donate)
Barrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to behold
This will just match up until the first </p>, even across multiple lines:
PHP Code:
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match);
__________________
Me | Blog | Last.fm | F@h | Archlinux.co.uk

archlinux User
Barrucadu is offline   Reply With Quote
Old 06-21-2008, 05:49 AM   · #7
Albino
Munky Designs
 
Trader Rating: (11)
Join Date: May 2005
Posts: 984
NP$: 406.00 (Donate)
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
perfect, thanks
Albino is offline   Reply With Quote
Old 06-21-2008, 11:10 AM   · #8
Albino
Munky Designs
 
Trader Rating: (11)
Join Date: May 2005
Posts: 984
NP$: 406.00 (Donate)
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
sorry, another few small things

im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character? but i'm guessing it references the </p> instead, just not sure how.

Also, i need a .htaccess file to change some urls. So I have this:

Code:
RewriteEngine on RewriteRule ^plurk/([a-z]+)/$ /plurk.php?user=$1

to turn /todd/plurk.php?user=Toddish to /todd/plurk/Toddish

but this doesn't seem to work, I get a 404

any ideas?

Last edited by Albino : 06-21-2008 at 12:11 PM.
Albino is offline   Reply With Quote
Old 06-22-2008, 02:23 PM   · #9
Bruce_KD
NamePros Member
 
Trader Rating: (1)
Join Date: Sep 2006
Posts: 76
NP$: 100.00 (Donate)
Bruce_KD will become famous soon enoughBruce_KD will become famous soon enough
I got this working as follows:
Code:
RewriteEngine on RewriteRule plurk/(.*) plurk.php?user=$1


I also tested Mikor's regex, and it does match across multiple lines.
If you want to throw a number of characters limit in, you can use this:
Code:
$data = file_get_contents('http://www.plurk.com/user/Toddish'); preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match); print_r($match);



Bruce
Bruce_KD is offline   Reply With Quote
Old 06-23-2008, 04:45 AM   · #10
qbert220
NamePros Member
 
qbert220's Avatar
 
Location: UK
Trader Rating: (22)
Join Date: Jul 2007
Posts: 113
NP$: 205.00 (Donate)
qbert220 is a jewel in the roughqbert220 is a jewel in the roughqbert220 is a jewel in the rough
Originally Posted by Albino
im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character?



The ? has 2 different meanings, depending on context.

After a normal character or expression is means one or zero occurrences of.

After a normally "greedy" operator it makes it non-greedy.

Greedy means that the operator will match as much as possible. * is normally greedy, so the if you just use .* it will match everything until the last </p> on the page. By making it non-greedy, it matches as little as it can. So it matches up to the next </p> on the page, i.e. up to the </p> at the end of the <p id="about_me"> paragraph.
__________________
DNSsy.com - DNS Test and Check

Last edited by qbert220 : 06-23-2008 at 04:48 AM.
qbert220 is offline   Reply With Quote
Old 06-24-2008, 04:40 AM   · #11
Albino
Munky Designs
 
Trader Rating: (11)
Join Date: May 2005
Posts: 984
NP$: 406.00 (Donate)
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
cheers, bruce,works fine

and thanks for the explanation qbert, that helped a lot!
Albino is offline   Reply With Quote
Reply

NamePros is a revenue sharing forum.

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Site Sponsors
Grow your forum! Hunting Moon Domain Tasting
Advertise your business at NamePros
All times are GMT -7. The time now is 01:45 AM.


Powered by: vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 2.4.0