My corporate overlords have a marvelous app, that among other things, generates reports. There is a particular trio of reports that are inter-related, hence they are executed together. Each of them take about eight hours to run against production data. To make matters worse, all three reports are executed serially. A typical execution lasts 24 hours. That is, of course, assuming nothing goes wrong in that time.
I was charged with the task of creating a fourth report, and turn this package into a quartet. Assuming similar execution time, we would be looking at 32 hours of continuous execution. That's of course assuming there wouldn't be further performance degradation, or nothing would go wrong in a day and a half of continuous execution. In the face of this task I did what any naive, non-unemployment-fearing engineer would do: I dug down into the existent source to figure out what the crap was going on.
I quickly found that it wasn't the most beautiful code in the world. For some reason there are some who think it's a good idea to run the same query 20K times, providing different parameters. Apparently It was also a good idea to copy/paste code, make redundant back-end calls, and have no inline comments what so ever. Holding back the urges to insult anyone, I got to work. After a couple of days I got the following done:
- Complete re-architecture of report framework.
- Created common ground for reports, that allowed easy extensibility for existing reports as well as creating future reports.
- Optimized database queries which resulted in an order of magnitude faster execution.
- Made report execution parallel.
- Overall resources utilization is dramatically lower.
- Product directors don't participate in architectural and development tasks.
- Architects and developers don't participate in implementation and maintenance tasks.
- Implementation and maintenance engineers don't perform system administration.
- System administrators don't know or don't care about the in house software they are running.
- Hardware is budgeted at purchase time. Once bought, it's treated as a common resource for common consumption. No metrics are recorded on performance.
The tragedy of the commons refers to [...] a situation in which multiple individuals, acting independently, and solely and rationally consulting their own self-interest, will ultimately deplete a shared limited resource even when it is clear that it is not in anyone's long-term interest for this to happen.How do we solve this problem? Forgive my use of buzz words, but I believe the answer lies in private compute clouds. Have the system administrators provide an internal service similar to Amazon EC2 or Rackspace Cloud. Their customers would be all the other people filling the roles I mentioned previously.
- Move all of the infrastructure to virtual environments.
- Have teams allocate virtual environments for everything from development, to testing, to UAT to actual production.
- Just like Amazon or Rackspace, teams will have options as to the size and processing power of virtual environments.
- Tie the costs of requested resources directly to budgets, cost centers, bonuses, etc.