- Impact
- 64
Demo
What this does is search a pages contents for a keyword, if the keyword is found, the uri is added to an array, and every page on the array is scanned for links, which are then scanned for the keyword, etc.
The array returned is a user-definable size, and it includes options to limit the spider to only crawl specific domains.
It's a bit slow because it indexes all the pages on the fly, if you made it store them in a database, and then search through the database it would be so much faster, but this is just a simple bit of code for people. Also: because it can be slow, I wouldnt recommend setting the max number of pages to much more than 100, thats what the demo is set to and you can see how slow it is (over a minute and a half, lol)
ToDo:
What this does is search a pages contents for a keyword, if the keyword is found, the uri is added to an array, and every page on the array is scanned for links, which are then scanned for the keyword, etc.
The array returned is a user-definable size, and it includes options to limit the spider to only crawl specific domains.
It's a bit slow because it indexes all the pages on the fly, if you made it store them in a database, and then search through the database it would be so much faster, but this is just a simple bit of code for people. Also: because it can be slow, I wouldnt recommend setting the max number of pages to much more than 100, thats what the demo is set to and you can see how slow it is (over a minute and a half, lol)
ToDo:
- robots.txt support
- <meta> robots support
- speed it up a bit













