Wednesday, March 03, 2010

Performance in the Corporate Environment

It's tragic. For the better part of the last decade I've been working IT in a corporate environment. I could go on and on about the how it's slowly corrupting my soul, but I want to focus on one particular aspect today: performance.

My corporate overlords have a marvelous app, that among other things, generates reports. There is a particular trio of reports that are inter-related, hence they are executed together. Each of them take about eight hours to run against production data. To make matters worse, all three reports are executed serially. A typical execution lasts 24 hours. That is, of course, assuming nothing goes wrong in that time. 

I was charged with the task of creating a fourth report, and turn this package into a quartet. Assuming similar execution time, we would be looking at 32 hours of continuous execution. That's of course assuming there wouldn't be further performance degradation, or nothing would go wrong in a day and a half of continuous execution. In the face of this task I did what any naive, non-unemployment-fearing engineer would do: I dug down into the existent source to figure out what the crap was going on.

I quickly found that it wasn't the most beautiful code in the world. For some reason there are some who think it's a good idea to run the same query 20K times, providing different parameters. Apparently It was also a good idea to copy/paste code, make redundant back-end calls, and have no inline comments what so ever. Holding back the urges to insult anyone, I got to work. After a couple of days I got the following done:
  • Complete re-architecture of report framework.
  • Created common ground for reports, that allowed easy extensibility for existing reports as well as creating future reports.
  • Optimized database queries which resulted in an order of magnitude faster execution.
  • Made report execution parallel.
  • Overall resources utilization is dramatically lower.
In the end, the quartet or reports which originally would have spent 32 hours in execution, now provide the same data output, in less than two hours! I'm feeling pretty good about myself, and assumed my corporate overlords would be happy with the result... They could care less. On the contrary, they were unhappy because of some trivial technicalities.

This long winded, self serving, dramatic tragedy of a story serves a purpose: to exemplify something I see happening, not only in the corporation I work in, but all over the industry: complete disregard for efficacy, efficiency and scalability. If more hardware, people, or money can solve the problem in the present, we need not fret about the future. We need to put out today's fire. Tomorrow's fire can wait.

Yes, it's boatloads of fun to whine about the problem, or to accuse my overlords of being ignorant, but let's spend some time trying to solve the problem. What is the problem here after all? There is no incentive to write applications make reasonable usage of their resources.
  • Product directors don't participate in architectural and development tasks.
  • Architects and developers don't participate in implementation and maintenance tasks.
  • Implementation and maintenance engineers don't perform system administration.
  • System administrators don't know or don't care about the in house software they are running. 
  • Hardware is budgeted at purchase time. Once bought, it's treated as a common resource for common consumption. No metrics are recorded on performance.
This hot potato game leads to an endemic problem: hardware resources are treated as common wealth in a large corporation. Everyone depends on them, but no one party cares about the system as a whole, because it doesn't affect anyone directly. As long as you do your part, and pass the hot potato along, your next paycheck is safe. Economists like to call this phenomenon "The Tragedy of the Commons":
The tragedy of the commons refers to [...] a situation in which multiple individuals, acting independently, and solely and rationally consulting their own self-interest, will ultimately deplete a shared limited resource even when it is clear that it is not in anyone's long-term interest for this to happen.
 How do we solve this problem? Forgive my use of buzz words, but I believe the answer lies in private compute clouds. Have the system administrators provide an internal service similar to Amazon EC2 or Rackspace Cloud. Their customers would be all the other people filling the roles I mentioned previously.
  • Move all of the infrastructure to virtual environments. 
  • Have teams allocate virtual environments for everything from development, to testing, to UAT to actual production. 
  • Just like Amazon or Rackspace, teams will have options as to the size and processing power of virtual environments.
  • Tie the costs of requested resources directly to budgets, cost centers, bonuses, etc.
The more resources you need, the more you have to pay (in one form or another). This gives decision makers a direct incentive to build and tweak applications perform as best as possible with their given resources. Assuming you have an open market, or a scenario that closely imitates it, individual incentive is the best way to achieve a goal. Adam Smith had figured this out back in the 18th century. No idea why this principle is ignored in the corporate environment, or simply applied exclusively for the elite echelon of executives. 

Startups and small companies have already figured this out, simply because they don't have money to burn on unnecessary resources. Their individual employees are much "closer" to the market place, and therefore are much more worried about sustainability and scalability. Corporations can't and wont figure it out, simply because it hides with the rest of characteristic bloat. Their employees are several layers hidden from the market place, and often ignore it completely. This approach can lead to leaner apps, even within the corporate context.

Private clouds are still being met with skepticism in many corporations. I wont list their advantages over traditional deployments. A quick Google search will provide much better info than what I can personally compile. Most results will point to the technological advantages or cost cutting techniques that can be implemented. Yet changes in social and political approaches to application development are, at least to me, far more fascinating.

No comments: