| |||||||
| Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics. |
![]() |
| | LinkBack | Thread Tools |
| | #1 (permalink) |
| New Member | Building a web site crawler Hi, We are in the process of building a cutomised site crawlers. We are quiet successful in building one. But I have a question for the expert coders. Is it possible to fetch last modified data of a page from anywhere if so how is it done?
__________________ Templates with CMS at 50$ http://www.affordablewebsolutions.com/ready-templates.php |
| |
| | #2 (permalink) |
| Traveller | Found this: 14.29 Last-Modified The Last-Modified entity-header field indicates the date and time at which the origin server believes the variant was last modified. Last-Modified = "Last-Modified" ":" HTTP-date http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html Might only work for static pages where the server can tell when the content has changed though. (e.g. web server has no idea if the database content has changed)
__________________ Internet.geek.nz NameCooler.com Unlimited Domain Name Web Hosting Travel Money Rates |
| |
| | #3 (permalink) |
| NamePros Member | The http last-modified header only works if your webserver supports it. If you are crawling sites on servers that are not yours, you shouldn't rely on that header. But besides that I have the feeling that you would like to only fetch the differences since the last update. Right? If so, than the answer is that you should code it yourself. Fetch a page, save that copy in a database (or file if you like) and compare the saved copy with the online version. Thazzz how you should do it.
__________________ A soul?... I've got not use for such frivolities. |
| |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| |