NamePros
Welcome, Guest! Ready to make a name for yourself in the domain business? We welcome both the hobbyist and professional domainer to join the discussion as part of the NamePros community.

Click here to create your profile to start earning reputation for posting, and trader ratings for buying & selling in our free e-marketplace. Build your trader rating with each successful sale. Our system has tracked over 100,000 sales and counting!
FAQ & TOS Register Search Today's Posts Mark Forums Read

Go Back   NamePros.com > Website Development Discussion Forums > Programming
Reload this Page Building a web site crawler

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.

Advanced Search


Closed Thread
 
LinkBack Thread Tools
Old 05-10-2007, 10:11 AM THREAD STARTER               #1 (permalink)
New Member
Join Date: May 2007
Posts: 4
Personaltrainer is an unknown quantity at this point
 



Building a web site crawler


Hi,
We are in the process of building a cutomised site crawlers. We are quiet successful in building one. But I have a question for the expert coders. Is it possible to fetch last modified data of a page from anywhere if so how is it done?
__________________

Templates with CMS at 50$ http://www.affordablewebsolutions.com/ready-templates.php
Personaltrainer is offline  
Old 05-10-2007, 03:12 PM   #2 (permalink)
Traveller
 
-NC-'s Avatar
Join Date: Mar 2007
Location: Yet another city
Posts: 1,419
-NC- has a brilliant future-NC- has a brilliant future-NC- has a brilliant future-NC- has a brilliant future-NC- has a brilliant future-NC- has a brilliant future-NC- has a brilliant future-NC- has a brilliant future-NC- has a brilliant future-NC- has a brilliant future-NC- has a brilliant future
 


Animal Cruelty Animal Rescue Ethan Allen Fund Protect Our Planet
Found this:

14.29 Last-Modified

The Last-Modified entity-header field indicates the date and time at which the origin server believes the variant was last modified.

Last-Modified = "Last-Modified" ":" HTTP-date

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Might only work for static pages where the server can tell when the content has changed though. (e.g. web server has no idea if the database content has changed)
__________________
NameCooler.com
-NC- is offline  
Old 05-12-2007, 03:50 PM   #3 (permalink)
NamePros Member
Join Date: Feb 2006
Location: Online
Posts: 118
Barts has a spectacular aura aboutBarts has a spectacular aura about
 



The http last-modified header only works if your webserver supports it. If you are crawling sites on servers that are not yours, you shouldn't rely on that header. But besides that I have the feeling that you would like to only fetch the differences since the last update. Right? If so, than the answer is that you should code it yourself. Fetch a page, save that copy in a database (or file if you like) and compare the saved copy with the online version. Thazzz how you should do it.
__________________
A soul?... I've got not use for such frivolities.
Barts is offline  
Old 05-28-2007, 07:44 AM THREAD STARTER               #4 (permalink)
New Member
Join Date: May 2007
Posts: 4
Personaltrainer is an unknown quantity at this point
 



Thanks buddies, But I could not find a practical solution for this. May be its yet to be resolved for dynamic sites
__________________

Templates with CMS at 50$ http://www.affordablewebsolutions.com/ready-templates.php
Personaltrainer is offline  
Old 05-31-2007, 12:20 AM   #5 (permalink)
NamePros Regular
 
DylanButler's Avatar
Join Date: Jan 2006
Location: San Diego, CA
Posts: 735
DylanButler is a splendid one to beholdDylanButler is a splendid one to beholdDylanButler is a splendid one to beholdDylanButler is a splendid one to beholdDylanButler is a splendid one to beholdDylanButler is a splendid one to beholdDylanButler is a splendid one to beholdDylanButler is a splendid one to behold
 



more posts from you guys please
DylanButler is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools


Liquid Web Smart Servers  
All times are GMT -7. The time now is 02:52 AM.

Managed Web Hosting by Liquid Web
Domain name forum recommended by Domaining.com Powered by: vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.6.0 Ad Management plugin by RedTyger