[advanced search]
Results from the most recent live auction are here.
16 members in the live chat room. Join Chat!
Register Rules & FAQ NP$ Store Active Threads Mark Forums Read
Domain Name Industry Newsletter
Go Back   NamePros.Com > Design and Development > Web Design Discussion
User Name
Password

Old 07-03-2008, 12:13 AM   · #1
Dr_Test
New Member
 
Trader Rating: (0)
Join Date: Jun 2008
Posts: 21
NP$: 0.00 (Donate)
Dr_Test is an unknown quantity at this point
Do raw logs show *everything*?

I have a question... Are Raw logs supposed to show *everything* that users are doing on my site? My site uses CPanel, and spits out what they call a "raw access log" I think, which seems to show all site file requests, etc... but sometimes I don't see certain things...

Example: Often, someone will load my main page, but instead of getting a log entry showing that they're downloading index.htm, I'll get one showing that they're downloading header.htm (an inline frame that displays my header.htm file).

Am I missing something here?

Also, if there is NO reference in my logs to someone downloading a certain file, does that mean the file was never downloaded, period? I'm wondering how the Yahoo bot used up 20 gigs last month on my tiny site, if it didn't download my huge RAR archives. (There's no mention of them ever being downloaded in the logs...)


Please register or log-in into NamePros to hide ads
Dr_Test is offline   Reply With Quote
Old 07-03-2008, 12:53 AM   · #2
weblord
www.1weblord.com
 
weblord's Avatar
 
Name: William R. Nabaza - williamrnabaza.com
Location: Philippines - www.Nabaza.com
Trader Rating: (234)
Join Date: Dec 2005
Posts: 19,369
NP$: 17780.28 (Donate)
weblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatness
Autism Protect Our Planet
it does show you the following on a raw log
ip address
date/time
Get < - someone accessed it
the filename if it's an .exe it also shows it.
Browser used
it will also shows if it's a se bot
if you're file is being downloaded by someone you can see the ip
weblord is offline  
  Reply With Quote
Old 07-05-2008, 02:01 PM   · #3
nielsencl
NamePros Regular
 
nielsencl's Avatar
 
Name: Chris
Location: Minneapolis
Trader Rating: (78)
Join Date: Jul 2006
Posts: 943
NP$: 415.08 (Donate)
nielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud of
The log files have everything in the way of site activity, so you should see all the page files, graphic files, include files, page errors, and any other kind of file that can be found on your site.

if you don't see any activity for a file, then it may not have been accessed. However, a busy web server may not always record all the activity 100%. I only say this because I know that in some of my reseller accounts I will see some gaps in the reporting, where it will look like the site was down for a couple of days, but I can tell that it was still working. The log process is something that may not always work as it should all the time, but in general it is very good and does show you everything.

If you look at the entire log, you should see by IP what a person does, but the files may not be in the order that you expect them to be, and other log entries from other people may be mixed up with different users.

Keep in mind that the home page could have been a request for ".../index.htm" or for ".../". :-)
nielsencl is offline   Reply With Quote
Old 07-05-2008, 02:35 PM   · #4
Dr_Test
New Member
 
Trader Rating: (0)
Join Date: Jun 2008
Posts: 21
NP$: 0.00 (Donate)
Dr_Test is an unknown quantity at this point
Okay, thanks.

So, here's the scenario: I'm suspecting that a certain user might be using a program like UpdatePatrol or NeoDownloader (or both) to keep tabs on my site, and basically bum-rush my files at certain intervals. (kind of like a bot, I guess) Could it be that this torrent of transfers is creating gaps in my raw logs? I imagine it could put the site under stress, if there are suddenly 10 requests for 700mb RAR files, all at the same instant.
Dr_Test is offline   Reply With Quote
Old 07-05-2008, 05:39 PM   · #5
nielsencl
NamePros Regular
 
nielsencl's Avatar
 
Name: Chris
Location: Minneapolis
Trader Rating: (78)
Join Date: Jul 2006
Posts: 943
NP$: 415.08 (Donate)
nielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud of
It's hard to say if you have missing entries or just large gaps between file requests. if your server is really overloaded it could be the logging process is a low priority and data may get dropped if the server can't keep up. Your hosting provider may be able to tell you more about that.

Your web stats program that comes with your hosting should show you a section of information with the IP addresses of those that use the most bandwidth. Using something like network-tools.com you can find out some information about them and make sure it's someone you are having a problem with. Then if you are on a Linux host you can block them from downloading any files if they keep using the same IP address.

If you have one or 700mb RAR files then you can easily start burning through your bandwidth even with "normal" requests". One thing to keep in mind is that some clients, like web site copiers, can make many requests for many files at one time. And some programs can many requests to copy just one file. You may also be getting hit with spiders and bots that are just trying to see what you have on your site. Using a robots.txt file can help keep them away from your large download files.

I have a site with about 25 million expired domain names in HTML files. When I spot an IP address downloading over 500MB during a month I take a look at where they are located. If they are in China or some other countries I may block them. I can't afford to have people sucking down huge parts of my site if it's going to cost me more for bandwidth.
nielsencl is offline   Reply With Quote
Old 07-07-2008, 11:00 PM   · #6
Dr_Test
New Member
 
Trader Rating: (0)
Join Date: Jun 2008
Posts: 21
NP$: 0.00 (Donate)
Dr_Test is an unknown quantity at this point
Hmm, okay, thanks for the help, all.

New confusion to add to the mix:

An IP that has the User Agent of the Google Bot (IP: 66.249.70.104 / Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) has been downloading stuff from my site, thought I have ALL bots disabled.

My robots.txt says this:
Quote:
User-agent: *
Disallow: /archive/
Disallow: /files/
Disallow: / <notice this one, which should block ALL the bots (at least the ones that respect robots.txt) from all dirs, not to mention the ones above, which are *still* being entered.



But look what this IP is doing:
Quote:
66.249.70.104 - - [03/Jul/2008:05:25:48 -0700] "GET /files/Thief/Faceless%20Part2%20-%20Ingame.zip HTTP/1.1" 206 16777216 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.70.104 - - [03/Jul/2008:05:27:15 -0700] "GET /files/Movies/deftonesvid.zip HTTP/1.1" 206 16777216 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"



I notice also that while IP's with "googlebot" user agents have been visiting and downloading all month long, they haven't made a single grab for my robots.txt file, as far as I see in my logs. The Yahoo bot has been hitting nothing BUT that file, and leaves immediately after.

Argh... I just wish the whole thing were more simple.

Last edited by Dr_Test : 07-07-2008 at 11:15 PM.
Dr_Test is offline   Reply With Quote
Old 07-07-2008, 11:03 PM   · #7
weblord
www.1weblord.com
 
weblord's Avatar
 
Name: William R. Nabaza - williamrnabaza.com
Location: Philippines - www.Nabaza.com
Trader Rating: (234)
Join Date: Dec 2005
Posts: 19,369
NP$: 17780.28 (Donate)
weblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatness
Autism Protect Our Planet
so if you don't want se to index your site, try putting that ip in your ip deny manager and also on your deny entry on .htaccess.

do you have any firewall installed? if so block that ip as well.

Originally Posted by Dr_Test
Hmm, okay, thanks for the help, all.

New confusion to add to the mix:

An IP that has the User Agent of the Google Bot (IP: 66.249.70.104 / Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) has been downloading stuff from my site, thought I have ALL bots disabled.

My robots.txt says this:


But look what this IP is doing:


Argh... I just wish the whole thing were more simple.

weblord is offline  
  Reply With Quote
Old 07-08-2008, 01:18 AM   · #8
Dr_Test
New Member
 
Trader Rating: (0)
Join Date: Jun 2008
Posts: 21
NP$: 0.00 (Donate)
Dr_Test is an unknown quantity at this point
Ok, I blocked a bunch of IP's... I'll see what happens.

Btw, I DID manage to figure out for sure that my raw logs are not logging everything. My site has had 300 visits so far this month, so I did a text-search in the log for one of the images on the front page, and it only came up about 30 times. Also, I examined some visits carefully, and noticed that not all of the images on index.htm were reported as downloaded. Usually they were, but about 1/4 of the visitors are reported as downloading a PORTION of the images (which are a combined total of about 20k. It's a fast-loading page).

Last edited by Dr_Test : 07-08-2008 at 01:23 AM.
Dr_Test is offline   Reply With Quote
Old 07-08-2008, 01:26 AM   · #9
weblord
www.1weblord.com
 
weblord's Avatar
 
Name: William R. Nabaza - williamrnabaza.com
Location: Philippines - www.Nabaza.com
Trader Rating: (234)
Join Date: Dec 2005
Posts: 19,369
NP$: 17780.28 (Donate)
weblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatnessweblord Has achieved greatness
Autism Protect Our Planet
it would also help if you change the filenames of the frequently downloaded files that you suspected, that goes the same to other huge files you might have and also apply some basic form of encryption if you're not into seo.

also instead of the direct link to the file offered for download put some download tracking software in between like
http://www.whatcounter.com/
is free and can be used to count actual downloads or much better as some practice of download site to put a basic form (only email and/or firstname) before it starts to download or a captcha anything to discourage mass downloads.

another thing is to report these attackers to their isp or hosting provider so you can at least delay their abusive actions on your site while it's giving you time to implement those tips i told you.

give us an update.


Originally Posted by Dr_Test
Ok, I blocked a bunch of IP's... I'll see what happens.

Btw, I DID manage to figure out for sure that my raw logs are not logging everything. My site has had 300 visits so far this month, so I did a text-search in the log for one of the images on the front page, and it only came up about 30 times. Also, I examed some visits carefully, and noticed that not all of the images on index.htm were reported as downloaded. Usually they were, but about 1/4 of the visitors are reported as downloading a PORTION of the images (which are a combined total of about 20k. It's a fast-loading page).

weblord is offline  
  Reply With Quote
Old 07-08-2008, 08:21 AM   · #10
nielsencl
NamePros Regular
 
nielsencl's Avatar
 
Name: Chris
Location: Minneapolis
Trader Rating: (78)
Join Date: Jul 2006
Posts: 943
NP$: 415.08 (Donate)
nielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud ofnielsencl has much to be proud of
Quote:
Btw, I DID manage to figure out for sure that my raw logs are not logging everything. My site has had 300 visits so far this month, so I did a text-search in the log for one of the images on the front page, and it only came up about 30 times. Also, I examined some visits carefully, and noticed that not all of the images on index.htm were reported as downloaded.

What you are seeing may or may not be an indication that not all the traffic is being logged.

- When a visitor hits a page with images, the page file and the image files are all requested, UNLESS the visitor has the images turned off (rare, but it is possible).

- The second time that same visitor hits the same page, it may only load the page file and load the graphics from their cache. Also if the same graphic is used on all pages, it may only get loaded once.

- But the most likely thing you are seeing is spider and bot traffic. In general they don't care about your graphics and so will only load your pages. For your site, they may only load your zip and other download files.

Blocking by IP is what I was going to suggest. If you use the network lookup at Network-Tools.com it can show you information about the visitor and help you to decide if you really want to block them or not. And it can also show you if the "visitor" is really a bot if the IP is located at an ISP like "The Planet". Then you can block all traffic from there, since it's not likely to be users (although it may be user proxy traffic). When you get a range to block like
NetRange: 66.249.64.0 - 66.249.95.255

Just enter in
66.249.64.
66.249.65.
66.249.66.
etc.
etc.
66.249.95.

to keep them all out.

And since Googlebot seems awlful interested in your zip files, I would either contact Google about this, or add a line that names the googlebot. It should be all you need. Perhaps the IP is being spoofed and it's not really Google...?

Finally, remember that Robots.txt is pretty good for keeping out robots and spiders, but it also works in reverse with people. If you want to know where people have stuff that you might be interested in getting, the first place to look is the robots.txt. :-(

One thing that can work well is to create a directory with a password-type name, like /883uJhh44-H3. Then all you have to do is control how people/bots learn about this directory and only tell people you want to know about it. The nice thing is that it's easy to change the folder name from time to time and keep old users out. Just make sure "directory browsing" is not enabled.
nielsencl is offline   Reply With Quote
Old 07-08-2008, 06:28 PM   · #11
Dr_Test
New Member
 
Trader Rating: (0)
Join Date: Jun 2008
Posts: 21
NP$: 0.00 (Donate)
Dr_Test is an unknown quantity at this point
Thanks for the pointers...
Btw:

Quote:
And since Googlebot seems awlful interested in your zip files, I would either contact Google about this...


That's what I've been trying to do, but I can't find any contact information. Any ideas? I tried their extensive Help and support sections, but they are highly frustrating to navigate, and seem bent on diverting you to endless help pages, with no contact info in sight.
Dr_Test is offline   Reply With Quote
Reply

NamePros is a revenue sharing forum.

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Site Sponsors
Find out how! Exdon http://www.mobisitetrader.com/
Advertise your business at NamePros
All times are GMT -7. The time now is 10:11 PM.


Powered by: vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 2.4.0