Speeding Up Web Applications: Part II

February 15th, 2010 By randy Categories: System Administration

In part I we identified some of the typical culprits of a slow web application.

The art of optimization consists of identifying and diagnosing these bottlenecks without making assumptions. Because bottlenecks can exists at various locations they often produce the same symptoms and can sometimes be interrelated. For example the two separate issues: a poorly configured server, or a poorly designed application can result in the same symptom: swapping under either heavy or light load. Depending on why the swapping occurs will determine what the solution is. The engineer must be able to take measurements, analyze particular system stats and make calculated logical deductions.

In my experience it’s not uncommon for a client to think that they know what the problem is and expect help in implementing their preconceived solution. After all, they did spend the last couple years making and acting on false assumptions.

Perhaps the classic case of this was a company I spoke with who was experiencing growth pains. Traffic was increasing, they had 12 application servers, still had massive bottlenecks and were looking for someone to help setup an infrastructure that would allow them to quickly deploy additional application servers. Their website didn’t do anything unique (none of them do), and the amount of traffic they were receiving was frankly irrelevant. They were convinced, more traffic means we need more servers. They had already attempted to outsource the serving of images to “the cloud” which they admitted was a mistake. They were also in the process of moving full-text searches to Apache Solr. When I asked them what type of hardware they were running- they didn’t know! Neither did they have conclusive evidence concerning where real bottlenecks occurred. They were using a co-managed solution and were paying their webhosting company monthly for each generic server a total of around 20. Additionally they were serving both public and private network traffic which consisted of heavy NFS, MySQL and Proxy traffic from the same NICs and switches.

What shocked me about this company wasn’t how superstitious their developers were, but how certain they were that adding additional servers was the solution. Even with unprivileged access to the servers I knew there was a high likelihood that we could increase performance while decreasing the number of servers even prior to optimizing their application code and doing the work of a DBA. Analysis is not staring at the results of `tail -f /some-big/log-file` and chanting together “more traffic, bots and attackers oh my!”

While not all inclusive, when working with clients these are the typical steps and tools I use during the discovery phase:

  1. Rule out the network layer.
    • If it is network: Is it a bandwidth issue?  Or is it a network configuration issue? Most of the time it is the later. Serving private network data from the same NICs and switches can cause serious latency depending on hardware.
  2. Rule out silly things like large files per page load.
    • One of the first things I do with clients is visit their website and get an understanding of how their application works. You’d be surprised how many websites are perceptually slower due to things like loading many images from many different servers, etc.. You’d think some of these things are common sense.
  3. Survey the load on application and database servers.
  4. Check I/O and Virtual Memory:
    • On a production application and database server during some time of heavy load (artificial or real) I am looking at the results of: `vmstat` and `iostat -dx`
    • I’m not only looking for the use of swap or heavy drive activity but I’m also looking for wasted resources, like a machine only using 3GB of its full 16GB ram or vice-versa.  Sometimes we want ram free for buffer cache others we want to utilize ram for database sort_buffer, query_cache, key_buffer, tmp_table_size, table_open_cache, etc.
  5. Monitor Database (MySQL) Activity
    • Setup a slow query log.
    • `mysqladmin status`
      I’m quickly glancing at open tables vs. opens to see if table caching is misconfiguration or not configured at all, avg. queries per second, slow queries, etc…
    • Probing the size of databases and indexes:
    • mysql> SELECT count(*) TABLES, concat(round(sum(table_rows)/1000000,2),’M') rows, concat(round(sum(data_length)/(1024*1024*1024),2),’G') DATA, concat(round(sum(index_length)/(1024*1024*1024),2),’G') idx, concat(round(sum(data_length+index_length)/(1024*1024*1024),2),’G') total_size, round(sum(index_length)/sum(data_length),2) idxfrac FROM information_schema.TABLESWhich results in something like:
      | TABLES | rows  | DATA  | idx   | total_size | idxfrac |
      |   1146 | 3.75M | 0.64G | 0.07G | 0.70G      |    0.11 |

      Or for a break down of each specific database:

      mysql> SELECT count(*) TABLES, table_schema,concat(round(sum(table_rows)/1000000,2),’M') rows, concat(round(sum(data_length)/(1024*1024*1024),2),’G') DATA, concat(round(sum(index_length)/(1024*1024*1024),2),’G') idx, concat(round(sum(data_length+index_length)/(1024*1024*1024),2),’G') total_size, round(sum(index_length)/sum(data_length),2) idxfrac FROM information_schema.TABLES GROUP BY table_schema ORDER BY sum(data_length+index_length) DESC LIMIT 7;

    • `mysqladmin ext -ri10`
      Initially gives us some very valuable server stats and then repeats them every ten seconds but showing the difference in values between the last and current. Using this we can view initial red flags and see what the MySQL server did within the last ten seconds to confirm their legitimacy.We’re looking for warning signs about the use of created_disk_tmp_tables, aborted_clients, admin_commands, incorrect use of query caches, key_sorts, mis-use of thread_caching, open files, etc.
    • `mysqlreport` and `mysqltuner.pl` are also wonderful third-party tools to help automate calculations for use in analysis. Those tools provide the luxury of retrieving the information and making many calculations for you.
  6. I look at their backend code, data structures, algorithms and DB queries and see how they can be optimized.
No comments yet.

Leave a Comment