We have all sorts of alarms that trigger if anything is slower than usual on our end.
Typically, these slowdowns are the result of transient network issues. We use Cloudflare as our primary CDN, so if their network is having trouble, you may notice slowdowns accessing NamePros.
You can view their network status here. Issues within Cloudflare's network are almost exclusively regional, and are typically resolved or worked around within an hour.
We track how long requests take to process on our servers. Here's what the breakdown looks like over the last 15 minutes:
The horizontal axis is the amount of time it took to process the request in seconds. The vertical axis is the relative number of requests. Notice that a lot of requests fall in the "0 to 0" range; these are requests that the server spent no measurable time processing (less than 0.001 seconds). Starting at 0.025 sec, you can see requests that weren't cached and required database access.
This isn't the whole picture, though. There are still at least three other factors that come into play:
- Load balancer delay
- Network travel time
- Time your browser spends processing the data it receives
- Time your browser spends taking that data and turning it into the image that appears on your screen
The first additional factor is probably the least obvious, but it's essential to the stability of NamePros. We have a number of separate web servers processing requests at any given time; typically no less than 3 and no more than 30. Cloudflare doesn't send your request directly to these servers, though. Instead, we have two servers in between that act as load balancers. They assess the health of each web server and ensure that only the healthiest servers ultimately receive the requests. If a load balancer determines that a web server is unhealthy (it's too slow, it's not responding, or anything else we measure seems unusual), it will automatically take that server offline and replace it with a new one, routing requests to other, healthier servers in the meantime. Load balancer delay is almost always negligible, to the point that we can't reliably measure it. We typically group it in with network travel time. However, if the load balancer delay were to become excessive, we'd be notified, and the load balancers will be automatically replaced. (This has only happened twice to NamePros, as far as I can remember.)
We run JavaScript in your browser that we use to track the last three factors, since we can't measure them from our servers. Your browser will measure the time it takes for each request to process, which our JavaScript will then report back for our metrics. (Most ad blockers will prevent this from functioning properly, however.) It's a little like Google Analytics and similar services.
Here's what we've measured over the past 24 hours:
In this chart, "Network" time includes time that the server spent processing the request, as your browser has no way of knowing the difference. However, we've already established that the time the server spends processing requests is negligible.
Most of the spikes we see on this chart are the result of network issues. Additionally, there's a rise-and-fall of page rendering time that repeats over a 24-hour period, with the trough lasting for the duration of the workday in American timezones. This is reflective of the average device quality that we see from each country. Timezones with less affluent countries that account for significant traffic to NamePros tend to contribute to the higher points. (Remember, this isn't a generalization; there are plenty of people in those countries with powerful devices. It only represents an average.) We don't get all that much traffic from South America, but we do get a lot of traffic from Africa, which explains why America's daytime is the low point, rather than Europe's daytime.
Notice that we're now dealing with seconds, rather than milliseconds. Five seconds might sound like a huge delay, but that's pretty low relative to other websites. That means five seconds from the time you click until the time the page is completely loaded, rendered, and interactive.
We take these times that we measure and use them to create an
Apdex score:
This score is calculated based on the time we typically expect a request to take compared with the actual time it took. The chart above is our Apdex score over a 24-hour period. Typically, we want the average to stay within the green and blue areas at the top; anything lower than that for an extended period of time signifies a slowdown.
We also break down page load times by percentile:
As expected, the average page load time is acceptably low (< 10s). The median page load time is even lower because so many of our requests are cached. The 95th percentile hovers around 15s. That means 95% of requests load within 15s. The 99th percentile jumps around; this is expected, as somewhere around the world, someone is probably going to be encountering network issues at any given moment in time. When a network issues affects a larger number of visitors, we also see the 95th percentile spike (that's what happened around 12:30 AM).
@alcy, you're probably spending a lot of time in the 99th percentile, unfortunately.
Nothing in any of these charts looks unusual; the past 24 hours have been pretty calm. We have hundreds of other metrics that we monitor, but none of them indicate a problem. This leaves us with several possibilities for each user who is experiencing issues:
- Localized network issues, often affecting only a single ISP within a small geographical area (e.g., a single city). There are always localized network issues; such is the nature of the internet. For something on which we've come to rely so heavily, it's amazingly unstable. There's really nothing we can do about this, unfortunately.
- Regional network issues resulting from a problem on Cloudflare's end. These affect a larger number of people, but Cloudflare does a very good job of recording these incidents in a timely manner. Should a major issue arise, we are prepared to use a backup CDN. This probably isn't the issue, as it usually results in a high plateau of the 95th percentile, at the very least.
- Problems with your computer or home/corporate network. These cases often result from out-of-date browsers, poorly-designed antivirus software, browser extensions/toolbars, or very old hardware. Sometimes it's as simple as the Wi-Fi connection becoming unstable anytime someone turns on a nearby microwave oven, or router that's dying. When these problems arise, all sites are usually affected, though there are occasional exceptions. It's unusual for it to only affect NamePros.
- An incompatibility between NamePros and whatever browser you're using. Browsers change, as does our website. From time to time, our site will break in specific versions of a specific browser. We do try to run tests regularly in different browsers, but we can't possibly test every configuration in use on our site. Some of these tests are even automated, and we'll be notified quickly if we break something. To ensure our site works in less common browsers and browser versions, we rely on our users to help us spot correlations. For example, if everyone who's experiencing a problem is using Opera, we'll need to run additional tests in Opera.
@creataweb, a number of your visits are coming from a notoriously unreliable ISP. Others are coming from public Wi-Fi access points and mobile connections. All of these have a tendency to introduce sporadic issues that will affect the speed of NamePros and other sites. You may notice it on NamePros because we run continuous checks for new messages and alerts, but most likely it will affect all sites.
@alcy, you said you need to open a new tab sometimes when the problem occurs. This sounds to me like a Chrome bug. I've run into a similar problem that results from an incompatibility between Chrome and my graphics driver. It's existed for several major versions now; sometimes it doesn't happen much, other times I can barely use Chrome. It seems to affect most websites. The current Chrome tab will freeze, refusing to navigate anywhere even if I manually type an address in the address bar. The only workaround is to close the tab and open a new one. I've also noticed that the same issue occurs if I have an unreliable connection to my ISP's DNS server.