How to Beat Common WordPress Development Performance Bottlenecks
At Flying Hippo, we love WordPress. Not only is it our CMS du jour, but its native functionality provides the backbone for many more-complicated features our clients often request.
Over the years, we’ve discovered numerous fixes and workarounds to improve the functionality and efficiency of our sites without sacrificing form for function. In this post, I’ll discuss some of these key findings and explain how each of these potential bottlenecks can be avoided or corrected.
The WordPress stack
In a normal WordPress environment, PHP provides the functionality while MySQL stores the vast majority of data. Variants of this — hosting environments such as Linux or Windows; web servers provided by Apache, nginx, or IIS — usually suffer from the same general pitfalls and constraints. For the most part, it is enough to know that PHP and MySQL are the major components (and hence, the areas where issues are frequently in need of redress).
A post (inclusive of all types, including the generic Post and Page types, Menu items, and custom post types) is represented as a row in the database. Any data not contained in that single row must also fan out to other records; this encompasses custom fields, taxonomies (tags, categories, etc.), user information, and many other components not otherwise listed.
Some resources, such as PHP code or images, are stored instead on the filesystem. Upon the majority of requests made to your website (including AJAX calls), the bulk of your WordPress PHP codebase, as well as any active plugins and theme functions, are pulled in and evaluated, accounting for much of the CPU load on a hosting environment. The natural exception to this is a cached site, where the work is done only periodically and the resulting response is saved for repeat visits (at the expense of memory and/or storage, coupled with the overhead of generating that cached copy).
Some basic performance considerations
Lazy-loading and asynchronous requests (“AJAX”)
The primary concern of most websites is the perceived responsiveness. If a single request becomes resource-intensive or performs poorly, it may help to break it up into several smaller tasks that occur simultaneously (asynchronously).
To give an example, a web page that takes three seconds (3000ms) to come back could perhaps be described as:
- 200ms: Generate a page header with custom menus and dynamic information generated at runtime
- 700ms: Retrieve post and custom fields from the database and use them to generate the page structure
- 1750ms: Search database for all Events (fictitious post type that represents a calendar event) that happen after today, sort them, and generate a Calendar widget that lists them
- 350ms: Generate sidebars and footers, including other assorted widgets and navigational elements
The problem should be readily apparent: It’s taking 1750 milliseconds (almost two full seconds) to produce the Calendar widget. If, for the sake of discussion, it was implausible to reduce the time it takes to generate that widget, it might make more sense to skip that element in the first pass, and “lazily load” it after the page has finished rendering. A revised version of this page might look more like:
- 200ms: Generate a page header with custom menus and dynamic information generated at runtime
- 700ms: Retrieve post and custom fields from the database and use them to generate the page structure
- 350ms: Generate sidebars and footers, including other assorted widgets and navigational elements (the page has finished loading at this point [1250 ms], and the user can view its contents sans Calendar widget)
- 1950ms: AJAX request to search database for all Events (fictitious post type that represents a calendar event) that happen after today, sort them, and generate a Calendar widget that lists them
A sharp eye will see that the total time has actually increased by a fictitious total of 200ms, due in part to the fact that the separate AJAX request requires the server to load the entire PHP codebase all over again. However, despite this additional time, the user will likely perceive the site as having loaded faster. The user experience will be that the page comes back after a little more than a second (1250ms), and sometime later the Calendar will appear. In all likelihood, the user will not miss the Calendar during that time. So, we’ve improved the overall experience.
Algorithms and code efficiency
Frequently, the problem with performance in any given site can be traced back to poor coding technique. Moore’s Law is partly to blame for this phenomenon — with the vast leaps in performance over many years of computing, and an ever-increasing amount of resources available to a computer, there is less emphasis in development on writing for performance constraints.
Of course, the boon of resources isn’t available to everybody; embedded development is a continuing field in which coders struggle to squeeze the most out of size and performance. In several respects, their efforts hearken back to an older age in computing, where 64KB of RAM was considered bleeding edge and more than sufficient for all but the most taxing of programs.
So what went wrong? It is partially out of the hands of a rank-and-file developer; the platforms (Windows, Linux, Mac OS X) tend to grow more monolithic with successive generations (having the assumption of increasing resources as this occurs — though this trend is reversing with the new drive of mobile applications), as do the software libraries which serve as the scaffolding of complex software. More prevalent, though, is an ambivalence shared among most coders regarding how plentiful resources are. Developers’ machines are often among the more flamboyant, with considerable CPU and memory resources at their disposal, and tend not to accurately reflect the real-world environments that their code inevitably gets deployed to. (In other words: Developers can be a bit out of touch.)
Another issue is the ease with which a person can learn to program. A large number of coders are self-taught — not necessarily a bad thing in and of itself, but some of the esoterics of a formal education include being aware of performance and the impact of subtle mistakes. An illustration of how a simple process can quickly go awry when the process itself (the “algorithm”) is poorly understood or thought out is the story of Schlemiel the Painter.
Loops
Much code takes place in one loop or another. Most languages have “while” or “for” constructs that continuously execute until a condition is met. Others can accomplish this task by setting a landmark (a “label” in assembly languages) that they can return to at need (using a “jump” or “goto”).
One common mistake is to continue running through the loop after it has accomplished what it set out to do, or by otherwise running through it more times than is appropriate. By way of illustration, suppose you need to add together all multiples of 5 between 0 and 100. Three approaches to this problem are shown below:
Approach 1 (the slow way)
You start at zero, and each time you loop through you add 1 to your current place. Then, you look at your current place, and if you’re at a multiple of 5, you add that number to your running total.
- Start at 0
- Increment by 1
- Are we a multiple of 5?
- Yes
- Increment our running total by the current amount
- No
- Do nothing
- Yes
- Are we at 100 yet?
- No Go back to #2
- Yes Stop, we’re done.
We’ve run through the loop (steps 2 through 4) 100 times before we get to the end. There are only 20 numbers in that loop we cared about, though (the multiples of 5).
Approach 2 (the fast way)
You do something similar to above, but you increment by 5 instead of 1 each time:
- Start at 0
- Increment by 5
- Increment our running total by the current amount
- Are we at 100 yet?
- No
- Go back to #2
- Yes
- Stop, we’re done.
- No
Our loop (again, steps 2 through 4) only runs 20 times.
Approach 3 (the “clever” way)
We can actually alter approach 1 slightly to make it as efficient as #2:
- Start at 0
- Increment by 1
- Increment our running total by the current amount, multiplied by 5
- Are we at 20 yet?
- No
- Go back to #2
- Yes
- Stop, we’re done.
- No
We keep the increment of 1 from the first approach, but only run it through 20 times. A little math would tell you that if you multiply every number from 1 to 20 by 5 that you will get a multiple of 5 each time. In reality, this is the same algorithm as approach #2, but written in a subtly different way (the multiplication by 5 was moved from step 2 to step 3).
“Round Trips”
A “round trip” describes the process of starting from one resource, querying another, and pulling the results back into the first. Scenarios of this include database requests from PHP (a line of code runs a SQL query and retrieves the results) or a browser requesting a webpage (an AJAX call, or pulling in an image or other resource). Due to the overhead required to perform the round trip from one resource to the next, these should be treated with caution and done only when absolutely necessary. Make as few round trips as possible for optimal performance.
Some of these situations can be avoided by carefully adjusting what you’re querying for. For instance, consider an archive page with the 10 most recent posts (a common scenario in WordPress). One approach to this might be to first identify which posts are the 10 most recent, then for each one return to the database and ask for the details about each post (11 round trips to the database). A cleaner approach would be a single SQL query that both enumerates the 10 most recent posts and returns all of the information about them in its response (making for a single round trip). That one response will be larger in size than the individual requests separately, but should prove considerably faster in the long run.
This is also the same thing to consider when making a page AJAX-heavy. Each request (also considered a “thread” in this context) may run simultaneously, or the server may see the order in which they were asked for and put them in a queue. How the server performs under these conditions depends heavily on its configuration (one CPU core or many, the threading model of the web server, etc.). This is compounded by the amount of data overhead (HTML headers, Ethernet frames) generated by each request that needs to be handled, as well as the processing overhead on the server for each of those requests. As with the database query, there are frequently opportunities to bundle things together to increase efficiency, but this must be balanced against the perceived execution time. In other words, we’re seeing the other side of the lazy-loading coin.
There are no straight answers here, as each situation must be evaluated individually. However, a less-seasoned programmer might fail to identify this as an area of concern and thus overlook it when tuning for performance.
Caching
As previously mentioned, PHP is an interpreted language, and code is evaluated at run-time. This footprint is in large part affected by “includes” from within code — for instance, WordPress may conditionally use a PHP file in one situation but not in another, and so your code size may differ based on the nature of the request. In no place is this more readily apparent than in WordPress Plugins, which are PHP codebases that are dynamically loaded when present AND active. Due to this, activating a poorly-written plugin can have a devastating effect on performance.
Additionally, if the site’s content does not change frequently and is consistent for long periods of time, each query against the database will continue to return the same data it did the previous times it was requested. While a sophisticated relational database management system (RDBMS) will perhaps realize that the data is unchanged since the last such request and return a cached copy, it is safer to assume that it must perform all the same work each time that it did the first time around.
In such situations, it may be ideal to implement one or more caching layers. These can be implemented in several ways, and while it isn’t feasible to account for every situation, some occasions where caching could be useful are illustrated as follows:
Third-party data
A situation often encountered in our own development is the use of a third-party API or data source. This has come in various forms — iCal event feeds, RSS feeds, Facebook or Twitter integration, and the occasional third-party CRM such as InfusionSoft. The workflow for this integration might look like:
- User requests from your site
- Your site receives the request
- Your site makes a request to the third-party API
- The third-party API responds
- Your site processes the response
- Your site responds to the user
In the above scenario, half of the steps (3 through 5) account for the work necessary to handle the third-party integration. Consider the matter from a geographical or logistical standpoint; the user might be in Los Angeles, CA, your website may be hosted in Chicago, IL, and the third-party application might be hosted in Toronto (Ontario, Canada). Supposing a theoretical 60ms hop between each of those points, and a single request/response necessary to perform any communication between two points, a given round trip would map out to:
- Los Angeles to Chicago (steps 1, 2): 60ms
- Chicago to Toronto (step 3): 50ms
- Toronto to Chicago (step 4): 50ms
- Chicago to Los Angeles (step 6): 60ms
The above comes to a total of 220ms. Were the third-party application removed from the process, the number drops to 120ms — almost halving the time. In reality, any of those numbers could quickly fluctuate, as the number of round-trip sequences necessary between any two points could increase as the situation requires; handshaking, authentication, and the nature of the workload could all combine to increase the number of trips necessary to complete the process. Also, at each endpoint it is assumed some processing is necessary to digest the data being handed around.
That’s all to say that things can get slowed down in a hurry if these isn’t some precaution taken in how these types of exchanges are executed.
A third-party data source that is fairly static, such as an RSS feed, might only need to be requested once per day. Working on that assumption, if the results of the first such request were cached for that duration we could avoid the overhead and delay involved in requesting it for subsequent requests. Thus, our fictitious scenario above would behave as described for the first request (220ms round trip), and every additional request would omit the third-party request (dropping to the 120ms figure).
RDBMS optimization
A DBA (database administrator) will often analyze the frequently-requested or resource-blocking queries that slow their system down, and from those queries derive finely tuned indexes. Good indexing is an art, and goes well beyond the scope of this narrative, but if complex queries are running poorly it may help to have a professional database expert do some profiling. This will often become an issue with a large website that has accumulated several thousand posts, and the proverbial needle becomes much harder to find as the haystack grows. Indexes may have a negative effect on write operations, but if well-designed will considerably improve read operations (which is the ultimate goal).
Additionally, as a table has had several operations (INSERT/UPDATE/DELETE) performed against it, things get messy. This depends heavily on the storage model used, but optimizing tables will often help a database that is performing poorly as a result of poor maintenance. For additional reading, consider How to Optimize the WordPress Database (wpexplorer.com).
Op-Code caching
The website as a whole can be cached as well using an op-code cache. This tool is platform dependent and requires certain PHP extensions to be in place to work properly, but if these conditions are met then you can utilize a WordPress caching plugin. Many such plugins may also attempt to cache resources (including CDN hosting for larger resources like videos or images). The end result should be that, regardless of the plugin chosen, the end-user experience is considerably improved from the standpoint of perceived responsiveness.
This does not come without caveats, however. As with other caching strategies, there will be a performance hit on the first attempt (the work that would have gone into the original request, coupled with the added work necessary to build the cached copy). Furthermore, this approach may not be sensitive enough to realize when the data that backs the cached copy has changed, and so visitors will potentially experience a stale and outdated copy of your content. When making any modifications to your site, forgetting this fact can prove to be a point of significant frustration when you don’t see anything happen.
The resource requirements for caching can sometimes be difficult to mitigate. There will be increased memory and storage requirements, dependent upon the caching strategy; and CPU load during the initial request will be increased. If a user encounters a part of your site that hasn’t been recently accessed and the cache needs to be built, this will possibly be poorly received.
Ultimately, though, this is the easiest way to gain visible improvements in site responsiveness. It must be emphasized that this is not an approach meant to hide defects in site programming, as the first visitor (or the first one in a long time) to any particular page or query will suffer all of the negative consequences.