Archive for July, 2008

To Get Better You Must Practice

Tuesday, July 15th, 2008

Almost everyone explicitly understands that physical activities such as golf or running or weight lifting require lots of repetitious practice in order to get better but most people don’t recognize that mental activities and business processes require the same practice.  We all studied in school to learn languages, algorithms, etc. but most of us either swore off studying upon graduation or forgot that study and practice are prerequisites to proficiency and excellence.  From the engineer to the manager to the CTO, everyone has skills and processes that need to be practiced and critiqued in order to improve.

As professionals no longer under the guidance of professors, we need to take responsibility for our own continuing education.  If you think that coding new features will provide you with enough stimuli for expanding your skill set, you should reexamine that idea by looking back over the past twelve months to see what you have learned that makes you a better engineer today.  It is most likely that if you have relied solely on assigned features you have fallen into the trap of using what you know to code them rather than stretching to learn new designs, patterns, or algorithms.  The wondrous thing about programming is that there are many ways to solve the same problem, some faster than others, some more eloquent than others.  We recommend a couple practices to help the engineers continue to learn and perfect their skills but these can really be expanded to other groups such as QA.  The theme behind these recommendations is leverage the shared knowledge of the entire organization to learn from each other.  This is one reason the whole is greater than the sum of the parts.   

Start your engineering all hands meeting with someone presenting a creative solution to a problem.  Have the engineering managers or architects decide whose solutions qualify for being interesting enough to share with the group or leave it to the individual engineers to decide.  Another idea for ensuring that you and other engineers continue to learn is practice code reviews.  Engineers sometimes get persnickety about someone reading over their code but this is a great way for the reviewer to learn new techniques as well as the engineer.  A final suggestion is to establish a Joint Application Design process where members of the operations team join the engineers and architects in the design process of the feature.  This inclusion of different perspectives will help broaden all participants understanding of technology. 

In terms of practicing processes this is similar to practicing skills.  If you never exercise the process or you do so in a halfhearted manner, you will never be good at it and when the time comes that you need that process to work perfectly it will assuredly not.  Some of the processes that get skipped too often are failovers and disaster recoveries.  If you don’t practice failing over when the failure occurs that requires a failover the process will not work.  It will either take way longer than you thought, result in unexpected outcomes, or possibly fail to work at all.  Obviously you must do these without impacting the production site but it is possible to exercise the failover without bringing down production. 

Remember what Sun Tzu said in the Art of War:  “The more you sweat in peace, the less you bleed in war.” 

 

Image provided by Mike Kline

Incenting Success in Technology Organizations

Tuesday, July 8th, 2008

As we’ve discussed before in articles like Be A Leader!, the primary job of a CTO is to help the executive team maximize shareholder value.  Notice our choice of verb in the last sentence, “maximize”.  It is a much stronger word than what an average performing company would select – that word typically being “create”.  Maximizing shareholder value is the goal of a high performing team – a team which desires to say that “no other team in our position could provide the type of shareholder return that we do”.

The CTO however cannot maximize shareholder value and potentially can’t even prove that he or she is creating shareholder value without a set of aggressive goals along with the metrics and measurements that help define success or failure enroute to achieving those goals. 

We prefer to group our goals thematically, making it easier to determine how the goals impact the maximization of shareholder value.  Our themes include the reduction of cost, availability, the efficiency of engineering spend, the effectiveness of our product selection process, quality, and time to market.

Cost
No list of aggressive goals is complete without finding a set of goals to minimize the cost of operating a SaaS site.  In our experience, the best cost metrics are those normalized by transaction (cost per transaction) or normalized by cost of transaction type (cost per checkout, cost per signup, etc).  The associated goal is to reduce the cost by some relative value over time or to reduce the cost to an absolute value thereby increasing profit and shareholder returns.

Availability
No SaaS site can realistically operate in this day and age without considering the impact of availability on revenue.  Our desire here is to identify the lost opportunity (in most cases lost revenue) associated with outages rather than just the amount of downtime a site has.  While measuring absolute downtime is valuable and should be tracked if possible, the measurement of revenue loss as a percentage calculation is more easily associated with shareholder value maximization (less revenue loss the better) and further takes into consideration that most sites don’t produce as much revenue in the middle of the night as they do during the middle of the day.

Engineering Efficiency and Productivity
You can’t be maximizing shareholder value if you aren’t measuring and improving your engineering team.  These measurements are arguably difficult, but we try to break them into two component parts: 

1) Efficiency - How many engineering days are you getting out of the theoretical maximum?  This is a measurement of how many engineering days you lose due to environment issues, training problems, tool issues, etc.  Most organizations that don’t measure this are surprised that their engineers spend well over 33% of their time on things other than designing systems and writing code.

2) Productivity - How much do you produce per engineering day? This one is tougher and there are lots of metrics out there from which you can select, KLOC, stories, function points, etc.  All of them have issues, but that’s no excuse not to select the best for you and measure how well you are doing.

Product Efficacy
Simply put, this is a measure of how your product choices are performing.  You undoubtedly have more ideas than you can implement in any given year.  Are you choosing the right things?  Are you hitting your key metrics such as increasing revenue, decreasing drop outs, or increasing signups?

Time to Market
Assuming that you are building the right things, are you getting them out to the market in time to create barriers to entry and/or switching costs?  Are you faster or slower than your competitors?

Quality
How defect dense is your product?  Are you fixing the problems in engineering and product management that lead to bugs in production?  Are you making the right time, cost and quality tradeoffs?  How many defects do you introduce per new release, line of code or story?

You may have several other key metrics that you use and which you find valuable and we’d love to hear about them.  What you cannot do, at least without significantly damaging shareholder value, is ignore the need for improvement.  You simply cannot improve your team’s performance without a core set of metrics against which you measure absolute and relative performance.  And if you are not measuring your performance you simply cannot increase and ideally maximize shareholder value.

Joint Application Design & Architecture Review Board

Thursday, July 3rd, 2008

We have mentioned a couple key processes in other posts that we want to explain in a little more detail.  These two fundamental processes to producing scalable and highly available architectures are the Joint Application Design (JAD) and Architecture Review Board (ARB).  These two processes help create strong bonds of communication between organizations thereby enabling shared ownership of products by all of the organizational disciplines within the extended technology team.  These processes can fit into any PDLC be it waterfall, iterative (including Agile), or any variant of those.  If you don’t have similar processes in place, we highly recommend you consider adding them. 

The JAD is usually accomplished through a series of small meetings where the architecture and design of any feature of significant size is discussed.  The participants of the JAD are the engineers assigned to a feature along with the operations/infrastructure engineers who have been assigned to assist with the feature in question.  Ideally, the meetings are held early in the development process to ensure that the design of the feature receives input from both software and operations engineers and that it does not violate the architecture principles of scalability and availability.  In an Agile development process these people can be normal members of the project team augmented by DBAs or systems administrators.  The JAD members will present to the ARB if the feature meets the criteria for board review.

The ARB is intended to catch potential scale and availability problems before they are launched to the site.  The ARB team should consist of the highest quality software and hardware engineers and members of the leadership team.  The membership of the ARB ideally be static (i.e. change very little over time).  The ARB should convene once every development cycle (monthly is usually sufficient) to review all features that are either greater than a specified number of development days (e.g. 5) or introduce a significant new technology (caching, language, service, etc).  The ARB members should a set of clearly defined architectural principals against which to test the new product by asking questions such as “How does this allow us to scale horizontally, maintain higher availability, etc”.  The development engineers and operation engineers who are responsible for the design of the feature present to the board and the board decides whether the feature was designed in such a manner that it will meet the scalability and availability requirements. 

Hopefully these descriptions of the processes will give you general understand of what is required and help you see why they are critically important to the development of scalable architectures.  There are obviously a lot of details about each of the processes that we have not covered in a post but this should get you started.