NameSilo

Tech question(s) about bots?

SpaceshipSpaceship
Namecheap AuctionsNamecheap Auctions
SpaceshipSpaceship
Watch

Chris2412

Established Member
Impact
1
I am a n00b developer and I would like to talk about bots and how they crawl a web page.

I tried searching keyword “bot” on the forum but I got nill, just a bunch of random results. I’m sure this is a thread covering this so a link provided will do just fine.

So, Google has “bots” that crawl your page. It looks for keywords, phrases, ect for cataloging purposes. Which is good, because you want your website cataloged in their search engine.

But there are other kind of bots, too. Yes?

Some of these bots are evil pawns sent out to do- what exactly?

Eat your bandwidth?

I read a simple php code a year or two ago, that basically makes bot’s sleep (or time out).

I am not even close to HTML 5 yet so perhaps I am getting way ahead of myself.

There’s no harm in asking. Perhaps I can bump this thread as I get more knowledge and have additional questions regarding coding.
 
1
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
GoDaddyGoDaddy
Bots are computerized visitors to your website, typically with a very specific purpose. They range from simple to very complex.

There are many varieties of bots, but the most important tend to be search engine crawlers. Crawlers roam your site without interacting, much like a human clicking random links. Their purpose is to gather data. Search engine crawlers mostly gather data about your content and how it's relevant to users. Googlebot and Bingbot are probably the most relevant to you. For the most part, Bingbot copies Googlebot. There are a lot of myths about Googlebot, and even Google often states misleading information. The practice of designing a website to appeal to search engine crawlers is called SEO: Search Engine Optimization. Googlebot is pretty smart, and does a good job of seeing a website the same way a user would. It likes to see modern coding and design, simple layout with focus on content, emphasis on security, and compatibility with modern browsers. If the text on a website sounds like a sales pitch, Googlebot won't be happy. It also tries to identify what purpose a website serves: if the site doesn't seem to offer any particular service or contributation to the internet, it won't rank the site as high. Googlebot focuses particularly on "above-the-fold" content: what's visible when you first load a page, without scrolling. There should be useful content up there, and it should load quickly. Googlebot dislikes when the entire page has to load before the above-the-fold area is legible. If a page takes a long time to load, Googlebot will penalize the site. If the site doesn't load at all--like with the idea you proposed--then the website will be removed from search results on Google. Generally, Bingbot tries to do all of the same things.

There are lot of crawlers that are used for research purposes. Many are run by non-profit organizations that shape the future of the internet. Others aren't quite as innocent, but for the most part they're still harmless. Almost all crawlers identify themselves with a unique User-Agent header and will listen to any restrictions you describe in your robots.txt file. As an example, Wayback Machine attempts to crawl every website and archive the history of the internet. You can view historical snapshots of websites using their free service.

Scrapers are simple bots used to pull structured information off specific pages of websites. Usually the intent is to steal information like e-mail addresses or telephone numbers that can be used for malicious purposes. Sometimes websites will use scrapers to copy content from other websites.

More info to follow when I have more time.
 
3
•••
Aha! Was hoping Paul would weigh in on this topic ...

Taking exception to one comment though:
If the text on a website sounds like a sales pitch, Googlebot won't be happy.

Googlebot doesn't care if you have a sales pitch as long as you have a genuine product or service to sell. Car dealerships can rank very well for car purchase queries and they aren't exactly fine examples of quality content (or, in most cases, good code ... or even unique content) What they do have is a real product (they sell the cars, they're not sending you to an Adsense ad for cars) and they don't have other ads plastered all over the page :).

End of detour - back to your regularly-scheduled bot talk ;)
 
1
•••
If you use cloudflare on your site they will do a great job of keeping offending bots away (as well as the other benefits).

I thought this too.....

Although in bitcoin world, cloudflare sites have notoriously been attacked and taken down by ddos attacks and extorted for bitcoin - even though I thought that's what cloudflare specialized in (ddos protection).

Unsure why cloudflare websites get targeted by ddos extortionists - not sure if it's cloudflare they are attracted to or if its just that the bitcoin websites they choose to attack typically always just "happen to use" cloudflare.

You keep talking about bots - and I'm growing curious if your words are actually talking about ddos attacks. Bots as in zombie farms that are used to manifest the attacks - sent to your site. You are referring to "eating your bandwith" and bots - and this is where my head goes with these questions.

If perhaps I'm hitting the nail on the head with what you are referring to, the purpose of these "bots" or attacks, is typically to be just annoying - to disrupt your service - to disrupt you.

Like I said above, this method has been used as an extortion method where a ddos is sent to a website, along with an email requesting a bitcoin payment. Emails usually say something like "We found a vulnerability in your website, please pay us xxx amount of btc to xxxxxxxxxxxxxxxxx wallet to get the ddos attack to stop".

Typically speaking, the person writing the letter merely bought the ddos attack online from some source - usually a group who runs a bot farm for this type of service - you pay them, they attack, you send email and extort.

If this is not referring to anything you were referencing or asking about, then I apologize for misunderstanding what you might be referring to.

Cheers! And I hope this helps a little bit!
z3
 
1
•••
Oh i wanted to add, my experience with bots on websites other than DDOS, worth mentioning - I've seen mostly when I build forums - like using SMF or wordpress sites. I notice without even advertising my site, tons of bots find it and register with spam addresses - making accounts and spamming spammy links to the forums or word press site. There are ways to keep them out, I think like ajax makes a script to help keep them out.
 
1
•••
Aha! Was hoping Paul would weigh in on this topic ...

Taking exception to one comment though:


Googlebot doesn't care if you have a sales pitch as long as you have a genuine product or service to sell. Car dealerships can rank very well for car purchase queries and they aren't exactly fine examples of quality content (or, in most cases, good code ... or even unique content) What they do have is a real product (they sell the cars, they're not sending you to an Adsense ad for cars) and they don't have other ads plastered all over the page :).

End of detour - back to your regularly-scheduled bot talk ;)

Well, everything I said was an oversimplification. There are always exceptions, and context is important. The idea is that you shouldn't be creating a website that is blatantly promotional without really contributing anything to the internet. Generally, Googlebot dislikes promotional content, but if everyone in your industry is doing the same thing, it doesn't matter all that much.

Promotional websites that rank higher than useful websites have typically tricked Google in some way. For example, exact match domains are very effective, and can easily be used to outrank competitors or more legitimate websites. Google maintains that exact match domains are not effective, but that's a load of horse dung.

And, of course, if your only purpose is to sell something, then Google's going to expect you to sound somewhat promotional.
 
0
•••
0
•••
Googlebot dislikes promotional content, but if everyone in your industry is doing the same thing, it doesn't matter all that much.

Promotional websites that rank higher than useful websites have typically tricked Google in some way.

Ecommerce sites aren't "useful" and are "tricking" Google in some way??

It's about query intent. Transactional, Informational, Navigational. Ecomm websites SHOULD rank above informational sites for transactional queries.

If I want to buy a pair of shoes I don't need Wikipedia telling me what a shoe is. If I want pizza, I don't need a scholarly treatise on pizza. If I want to buy a BMW I dont' need sites telling me what a car is or an extensive history of the BMW company ..show me some dealerships so I can see what they have in stock. Transactional queries, all :).

--- a-a-a-a-nd, back to bot talk.
 
Last edited:
2
•••
Paul Buonopane, please marry me.

Ok, either you're just really weird (no offense)... or you are actually trying to do something malicious with bots, because otherwise your obsession makes zero sense.

Seriously, it's not a thing you should even think about.

Tech people ARE weird. We obsess over this stuff all the time because we find it fascinating and we're so eager to learn about it.

His question makes absolute sense to me. When I first started web development one of my client's websites was hacked. I set out to learn what hacking methods were used so I could prevent it in the future. Learning how hackers hack taught me how to secure my websites. Chris2412 simply wants to learn about bots so he can best deal with them.
 
3
•••
1
•••
Truehost — .com domains from $4.99, hosting includedTruehost — .com domains from $4.99, hosting included

We're social

Escrow.com
Spaceship
Domain Recover
CryptoExchange.com
Catchy
DomDB
NameFit
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back