We recently moved a client’s deployment of Drupal that was running on a single server environment, meaning one dedicated server running both Apache as the web server and MySQL as localhost on the server itself. This is a very common configuration for systems and sites that grow beyond their single server’s ability to meet the demands of your traffic.
You can get an overview on how to grow a single dedicated server or VPS into a cluster in our previous post.
You can view our Drupal Cluster & Drupal Load Balancer vendor here.
Mo’ users Mo’ Problems.
Upgrading Drupal to run on a cluster involves changing the way you think about the resource calls to Drupal’s individual components.
Our client’s traffic averaged about 800,000 visitors – A DAY, essentially jumping from about 120,000 daily visits, which the single server setup, as large as it was, was struggling to keep up.
With that many visitors, it is not always easy to just identify the bottlenecks from looking at the server side of the equation, its processes, MySQL usage and so on. This is because almost a million visitors a day magnifies everything to where even the smallest process running at a single point in time has a great impact.
Since we did not develop this client’s sites, we did not have the luxury of building things in a scalable fashion with running Drupal on a cluster as the intended goal. Instead we took the path of least resistance and began changing, optimizing, shutting off things as so:
You’re logging what?
In a world where Google analyitics knows all and see all, you probably do not need to be logging every piece of information Drupal or your server can collect. With high traffic Drupal sites, every request counts and if your Drupal build is logging to the database, these tables are going to grow, and grow and grow in size. More tables, more things to get fragmented and corrupt. We removed a tracking module one site had that had MILLIONS of entries per day and was GIGS in size, greater than the rest of the databases on this server combined. Removing that module increase performance significantly and reduced the load of MySQL.
As long as you do not need the stats for some other reason you should consider:
- Turn off Drupal logging, Drupal Error Reporting and any other information being collected.
- Turn off statistics packages running on your servers, including AWStats, Webalizer, Analog or Logaholic Stats packages.
- Turn off statistics processing as some of these jobs run for hours and hours, even days if you are dealing with logs involving millions of visits a day.
- Turn off error logging or have a process for paring down logs to a reasonable level, even small errors and warning add up when talking traffic of this magnitude.
- Consider how you are logging and storing you /var/logs data. If you server setup is using multiple web servers at a glance these logs are less useful if you are using a load balancer that is splitting traffic across your network. I am not saying you do not need these, I am just saying that with “great traffic comes great responsibility” and you should set up a retainment policy for logs that does not increase your load or create storage problems. We encountered a 300GIG instance that ran out of storage within 1.5 days because it was logging everything and also had a runaway caching system, which leads us to:
Cache me if you can!
Hey, this pun works and you know it. Drupal caching can be very powerful and a boon to your deployment of Drupal in a cluster environment. The two caches we usually work with are Memcache/Memcached and Varnish. Memcache you set up as a service and run continuously, while Varnish is code-side.
Drupal 7+ has its own caching system built-in while older versions of Drupal, 5,6 etc often reply on third party modules like Boost cache.
IMPORTANT! Depending on the nature of your website and whether you make use of unique dynamic content such as :
- Ad-serving technologies, where each unit carries unique information.
- Video-serving technologies, where each load is considered unique.
- Other dynamic content such as comments, related content (off-site usually) or syndication partnerships.
Depending on how your caching is set up to cache these blocks, templates or content that contains additional dynamic content (yes, even a single unique ID will trigger a whole new page to be cached), you can quickly overwhelm your CPU or worse yet, fill up your disk with millions and millions of cache files when talking about the traffic a cluster would probably warrant.
The Drupal Admin on a cluster – logging into Drupal
When you have a cluster of servers running Drupal, you have to plan ahead for you you are going to administer the website or do future development.
Obviously, you could make changes and then push them to each individual web server individually, but this is inefficient slow, may cause downtime or cause problems with your cluster during the time to takes you to put or get files since only one server will have the updates at a time.
This means you have to create a deployment strategy involving something like a versioning system, Git, Github or other deployment system, which we will cover in a different post.
Drupal is now in session.
As far as staying logged into Drupal on a cluster with an external database server, it is all about the load balancer I’m afraid. We get our Drupal Loadbalancer
Your load balancer will have a mode called “session persistence” which needs to be set in order to stay logged into Drupal’s admin. If this is not set, every request you make may ping pong off the different servers in your cluster and you will get the dreaded “you are not authorized to access this site” message.
The reason is simple, if you are not using session persistence with your load balancer and Drupal, then you are logging into one session on one server, then the next click “may” have you on a different server, then another and so on.
Even if you continued to log in over and over again, the sessions would not match and Drupal will kick you out.
With session persistence, you stay logged in and so long as all of your Web Servers in the cluster are using the one remote database host, the changes you make will appear fine since they are in the single database.
It is also important when you create your database server to have your host or data center put the database server on a private network which will allow the web server to access the database lightning fast instead of going over traditional network connections.
This ends our post for now, next time we’ll cover other optimization of Drupal in a Cluster environment and focus more on how your development and deployment needs to change to accommodate such a monster. Click here if you need a good vendor for putting Drupal in the Cloud and Drupal enterprise hosting.