[advanced search]
 

Go Back   NamePros.com > Discussion > Web Design & Development > Programming

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.


Closed Thread
 
LinkBack Thread Tools
Old 06-20-2008, 10:25 AM   #1 (permalink)
Munky Designs
 
Join Date: May 2005
Posts: 997
417.00 NP$ (Donate)

Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough


[PHP] file_get_contents and regex

Hey,

I have this terrible feeling i'm missing something really simple here.

I'm trying to get my head around file_get_contents, combined with regex, to get specific bits of data from web pages.

Lets say I want the about my part on my plurk page: http://www.plurk.com/user/Toddish

the source code is:

Code:
 <p id="about_me">
                    I'm a web designer from the UK who loves games, music, and films :)
<br>If you like any of my Plurks, feel free to add me!
                </p>
so, I tried this:

PHP Code:
<?php
$data
= file_get_contents('http://www.plurk.com/user/Toddish');
$regex = '/<p id="about_me">[.*]<\/p>/';
preg_match($regex,$data,$match);
var_dump($match);
?>
but I get nothing.

any idea what I'm missing?

cheers, rep etc as usual
Albino is offline  
Old 06-20-2008, 11:01 AM   #2 (permalink)
NamePros Member
 
Join Date: Sep 2006
Posts: 87
100.00 NP$ (Donate)

Bruce_KD will become famous soon enoughBruce_KD will become famous soon enough


You're not escaping any characters in the regex match.
The white space is also screws it up. If you really need that, there are workarounds, but (.*) excludes white space.

Code:
$data = str_replace("\n", '', file_get_contents('http://www.plurk.com/user/Toddish'));
echo preg_match("/\<p id\=\"about_me\"\>(.*)<\/p\>/",$data,$match);
print_r($match);

Bruce

P.S. I think this is the forum for code snippets you're sharing with others; it's parent forum (just "Programming") is for code help. Though I may be wrong, I'm fairly new
Bruce_KD is offline  
Old 06-20-2008, 11:29 AM   #3 (permalink)
Munky Designs
 
Join Date: May 2005
Posts: 997
417.00 NP$ (Donate)

Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough


cheers, didn't realise you had to escape all of them as well.

Now I am getting all the text after it aswell. I tried to limit it via:

PHP Code:
/<p id="about_me\"\>(.*){1,400}<\/p\>/
but I now get NULL.

cheers so far, rep added

oh, and it does look as though I mis clicked on the wrong forum, if a mod cold move it please
Albino is offline  
Old 06-20-2008, 12:25 PM   #4 (permalink)
NamePros Member
 
Join Date: Sep 2006
Posts: 87
100.00 NP$ (Donate)

Bruce_KD will become famous soon enoughBruce_KD will become famous soon enough


You missed escaping the first tag on both p's and equals sign.
Try
Code:
/\<p id\=\"about_me\"\>(.{1,400})<\/p\>/

Bruce
Bruce_KD is offline  
Old 06-21-2008, 02:22 AM   #5 (permalink)
Munky Designs
 
Join Date: May 2005
Posts: 997
417.00 NP$ (Donate)

Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough


I can't even copy and paste properly, oh dear!

my actual code is:

PHP Code:
preg_match("/\<p id\=\"about_me\"\>(.*){1,1000}<\/p\>/",$data,$match);
Albino is offline  
Old 06-21-2008, 03:06 AM   #6 (permalink)
Barru.
 
Barrucadu's Avatar
 
Join Date: Aug 2005
Location: East Yorkshire, England
Posts: 2,731
78.50 NP$ (Donate)

Barrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to behold


This will just match up until the first </p>, even across multiple lines:
PHP Code:
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match);
Barrucadu is offline  
Old 06-21-2008, 05:49 AM   #7 (permalink)
Munky Designs
 
Join Date: May 2005
Posts: 997
417.00 NP$ (Donate)

Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough


perfect, thanks
Albino is offline  
Old 06-21-2008, 11:10 AM   #8 (permalink)
Munky Designs
 
Join Date: May 2005
Posts: 997
417.00 NP$ (Donate)

Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough


sorry, another few small things

im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character? but i'm guessing it references the </p> instead, just not sure how.

Also, i need a .htaccess file to change some urls. So I have this:

Code:
RewriteEngine on
RewriteRule ^plurk/([a-z]+)/$ /plurk.php?user=$1
to turn /todd/plurk.php?user=Toddish to /todd/plurk/Toddish

but this doesn't seem to work, I get a 404

any ideas?

Last edited by Albino; 06-21-2008 at 12:11 PM.
Albino is offline  
Old 06-22-2008, 02:23 PM   #9 (permalink)
NamePros Member
 
Join Date: Sep 2006
Posts: 87
100.00 NP$ (Donate)

Bruce_KD will become famous soon enoughBruce_KD will become famous soon enough


I got this working as follows:
Code:
RewriteEngine on
RewriteRule plurk/(.*) plurk.php?user=$1
I also tested Mikor's regex, and it does match across multiple lines.
If you want to throw a number of characters limit in, you can use this:
Code:
$data = file_get_contents('http://www.plurk.com/user/Toddish');
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match); 
print_r($match);

Bruce
Bruce_KD is offline  
Old 06-23-2008, 04:45 AM   #10 (permalink)
NamePros Regular
 
qbert220's Avatar
 
Join Date: Jul 2007
Location: UK
Posts: 240
214.85 NP$ (Donate)

qbert220 is a glorious beacon of lightqbert220 is a glorious beacon of lightqbert220 is a glorious beacon of lightqbert220 is a glorious beacon of lightqbert220 is a glorious beacon of light


Quote:
Originally Posted by Albino
im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character?
The ? has 2 different meanings, depending on context.

After a normal character or expression is means one or zero occurrences of.

After a normally "greedy" operator it makes it non-greedy.

Greedy means that the operator will match as much as possible. * is normally greedy, so the if you just use .* it will match everything until the last </p> on the page. By making it non-greedy, it matches as little as it can. So it matches up to the next </p> on the page, i.e. up to the </p> at the end of the <p id="about_me"> paragraph.

Last edited by qbert220; 06-23-2008 at 04:48 AM.
qbert220 is offline  
Old 06-24-2008, 04:40 AM   #11 (permalink)
Munky Designs
 
Join Date: May 2005
Posts: 997
417.00 NP$ (Donate)

Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough


cheers, bruce,works fine

and thanks for the explanation qbert, that helped a lot!
Albino is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Site Sponsors
Advertise your business at NamePros

All times are GMT -7. The time now is 10:57 PM.


Powered by: vBulletin® Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0
Template-Modifications by TMS
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85