| | |||||
| ||||||||
| Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics. |
![]() |
| | LinkBack | Thread Tools |
| | THREAD STARTER #1 (permalink) |
| New Member Join Date: May 2007
Posts: 4
![]() | Building a web site crawler Hi, We are in the process of building a cutomised site crawlers. We are quiet successful in building one. But I have a question for the expert coders. Is it possible to fetch last modified data of a page from anywhere if so how is it done?
__________________ Templates with CMS at 50$ http://www.affordablewebsolutions.com/ready-templates.php |
| |
| | #2 (permalink) |
| Traveller Join Date: Mar 2007 Location: Yet another city
Posts: 1,419
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() | Found this: 14.29 Last-Modified The Last-Modified entity-header field indicates the date and time at which the origin server believes the variant was last modified. Last-Modified = "Last-Modified" ":" HTTP-date http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html Might only work for static pages where the server can tell when the content has changed though. (e.g. web server has no idea if the database content has changed)
__________________ NameCooler.com |
| |
| | #3 (permalink) |
| NamePros Member Join Date: Feb 2006 Location: Online
Posts: 118
![]() ![]() | The http last-modified header only works if your webserver supports it. If you are crawling sites on servers that are not yours, you shouldn't rely on that header. But besides that I have the feeling that you would like to only fetch the differences since the last update. Right? If so, than the answer is that you should code it yourself. Fetch a page, save that copy in a database (or file if you like) and compare the saved copy with the online version. Thazzz how you should do it.
__________________ A soul?... I've got not use for such frivolities. |
| |
| | THREAD STARTER #4 (permalink) |
| New Member Join Date: May 2007
Posts: 4
![]() | Thanks buddies, But I could not find a practical solution for this. May be its yet to be resolved for dynamic sites
__________________ Templates with CMS at 50$ http://www.affordablewebsolutions.com/ready-templates.php |
| |