antonymarcano.com

Knowledge-work, is serendipity-work

2012-09-18T09:10:07Z

Serendipity is core to the success of modern companies. While some may not have articulated their insights as facilitating serendipity, it was certainly what they were thinking, even if they didn’t realise it.

Enabling Serendipity

For example, Google’s 20% time, where employees can spend 20% of their time working on anything they like, has resulted in numerous “happy accidents” where pet projects have matured into fully fledged revenue enhancing products, such as gmail.

Charles Leadbeater talks of the “we think” phenomenon, where innovation occurs simply by people having access to products that give them flexibility in how they use them. This philosophy taps into the free thinking collective consciousness of the social web. The birth of Twitter is the perfect example. Hash tags, “@” mentions, retweets are all customer innovations that were subsequently added to the Twitter experience we know and love today. It’s highly likely that “@” mentions were a significant factor in boosting user engagement and hashtags are now one of Twitter’s revenue sources.

Countless more examples exist but how is this achieved and what does this have to do with serendipity?

Adrian Cockroft of Netflix explains that you can’t force innovation, you have to get out of the way of innovation. This applies to what we enable people to do with products, such as the Twitter story, as much as what people do with their 20% (or more) time.

Facilitating Serendipity

It’s not just the organisation that needs to embrace serendipity, the working environment does too. In “Designs for Working” Malcolm Gladwell highlights:

“Who, after all, has a direct interest in creating diverse, vital spaces that foster creativity and serendipity? Employers do.”

He uses Greenwich Village, where opportunities for people to meet by happenstance are rife, as a model for the ideal work environment. Many of his ideas can be found in the forward thinking companies, I’ve already mentioned, that are breaking all the rules and being so successful as a result.

Much innovation in what we produce or even how we produce it may never have happened if they had required a typical business case or if the working environment was more like a factory-line or made up of closed offices.

The Serendipity Economy

Daniel Rasmus, the person who coined the phrase “the serendipity economy”, captures the problem with your traditional business case so well when he says:

“If the only economics you have to work with come from the industrial age, then everything looks like a factory.”

Rasmus highlights, in “Welcome to the Serendipity Economy”, the benefits of knowledge, ideas, experimentation are not immediate or predictable within a specific time-frame or quantifiable with traditional measures of productivity or returns on investment. He highlights the importance of serendipity in innovation and the power of social networking in accelerating the process.

I was talking with one client the other day about their budget approvals process. It was so heavyweight that several increments of the product could have been developed with the money spent on documenting theoretical feasibility and commercial viability.

Exploiting Serendipity

Instead, companies would be better off adopting a lean start-up model for new projects. Ensure that employees have social networking tools at their disposal. Share the ideas. Create a simple sign up page that’s sent out to employees or even the general public. Get feedback. Provide “seed-funding” to the projects that inspire the most passionate responses. Get the first few features built. See how hard or easy it is. If it works out, you’ll have empirical data to base future estimates on and you’ll have the beginnings of the product (maybe something releasable).

All this can be done with less than many large companies would have spent on writing business cases with hypothesised technical-feasibility and unproven commercial-viability. Yes, some ideas will flop, but that experience could be the very thing that inspires your biggest success.

Google, Twitter, Netflix and more show us that investing in serendipity (and talent) is key to the modern organisation’s success. It will take a leap of faith that many feel they cannot afford to take in these difficult economic times. But, to survive and thrive in the future, you have to ask if it’s a leap you can afford to not take.

GhostDriver – Headless Selenium RemoteWebDriver

2012-06-21T12:47:39Z

GhostDriver is a Selenium server that can be connected to via a RemoteWebDriver.

It runs completely heedlessly on PhantomJS but still behaves (mostly) like a typical webkit browser. By avoiding the overhead of actually rendering the page in a GUI it can run standard Selenium tests noticeably faster and on a headless build server too.

This is especially beneficial if you want to use continuous testing IDE plugins like Infinitest or JUnitMax or simply to encourage developers to run end-to-end tests more frequently.

GhostDriver is written on top of PhantomJS, a headless WebKit browser which uses QtWebKit.

It’s already 35%-50% faster than ChromeDriver despite the fact that QtWebKit has the standard WebKit Javascript engine. QtWebKit is moving to the V8 JS engine (as used in Chrome) which means it’s only going to get faster…

It’s still work in progress and hasn’t implemented the entire Selenium wire protocol yet but it’s certainly one to watch.

Presentation at SeleniumConf 2012: http://www.youtube.com/watch?v=wqxkKIC2HDY

Find it on GitHub: https://github.com/detro/ghostdriver/

Implemented by: Ivan De Marino

This is not a manifesto: Valuing Throughput over Utilisation

2012-02-12T11:31:26Z

In a previous article, This is not a manifesto, I expressed the values I hold as a software development team member. Today, I’m going to talk about the first of these values.

Before I do, I’d like to say what I mean by “software development team”. I mean a cross-discipline team with the combined skills to deliver a software product – product owner, user experience, business analysts, programmers, testers, dev-ops, etc.

A common problem

Many teams I encounter will, at least in the beginning, have team-members who specialise in a single role. Each team-member will rarely, if ever, step outside their job description [1]. This can cause a problem.

Many teams find themselves in a situation where some team members have little to do on the current stories. At this point, the team has some choices, they can focus the under-utilised people on:

Future user-stories, working on tasks most relevant to their job title
Another team, working on tasks most relevant to their job title (e.g. ‘matrix management’)
Current stories, taking on tasks that will take the story to completion sooner, even if it’s more relevant to someone else’s job title
Things that will make the more utilised people get through their work faster, today and in the future.

More often than not, I find teams taking option 1. I think that people choose this option because it feels like it should increase the throughput of the team – i.e. the amount of new features they can add to the product. In fact, it has the opposite effect.

Little’s law

Let’s consider a team that is working in time-boxes or ‘sprints’ (à la Scrum) and measures it’s throughput with ‘velocity’ (story-points per-sprint). Points are accrued as each user-story is completed – i.e. coded, tested and product-owner validated.

In this particular team, it is able to complete an average of 10 points per sprint. Let’s say this is due to a bottleneck in the process. This bottleneck might limit the amount of testing that can be completed during the sprint. This is illustrated in fig.1 where each ball is a story point.

Little’s law [2] tells us that the amount of time it takes to complete an item of work (cycle-time) is:

Work in Progress (WIP)
Throughput

Which we can read as:

story points in progress
velocity

In this simplified example, WIP is 10 points and throughput is 10 points per sprint. The average cycle time of a story is therefore 10/10 = 1 sprint. Notice, however, there’s all that spare capacity.

Let’s say this spare capacity is developers. So, the team starts taking on more user-stories from the backlog – increasing the number of story points in progress to 20 points (fig.2).

Because the bottleneck remains, velocity remains the same – 10 points per sprint. The amount of flexibility that the team has, however, is now reduced because the average time required to get a story to completion has increased from 1 sprint to 20/10 = 2 sprints [3].

Often this will take the form of testers working a sprint behind the developers. Worse still, over 10 sprints, the team still only completes 100 story points (as they would have before) but is left with a lot of unfinished work. This ‘inventory’ of unfinished work carries overheads which can, ultimately, reduce the velocity of the team.

The impact of filling capacity in this way has yet another effect.

Latency effect

As the developers get through more stories, a queue of stories that are “ready for testing” will build up. Testers working through these will, at least, have questions for the developer(s) or even find some defects. The developer, having moved on from that story, is now deep into the code and context of another story. When the tester has questions, or finds defects, relating to a story that the developer had worked on, say a week ago, then the developer has to reload their understanding of this old code and context in order to answer any questions or fix any defects. This context-switching carries significant overheads [4].

The end result is that the effort required to complete a story increases due to the repeated context switching, therefore reducing velocity.

So, not only does filling the capacity with more work fail to increase throughput, it adds costly context-switching overheads – ultimately slowing everything down.

This phenomenon is not unique to teams using fixed-length time-boxes, such as sprints. Strictly following Kanban avoids this problem, but what I’ve seen is some teams creating a ‘ready for testing’ queue – so that developers can start work on the next story. This has the same latency effect and turns a process designed for continuous flow into a batch and queue process. But, I digress.

What to do with the spare capacity?

The simple answer is to look at the whole approach and determine what is slowing things down. In the example above, I’d be wondering what’s slowing the testing down. Many of the things that hinder testing can be addressed by changing how we do things ‘upstream’.

Are lots of defects being found, causing the testers to spend more time investigating and reproducing them? Can we get the testers involved earlier? Can any predictable tests be defined up front so that developers can make sure the code passes them before it even gets to the testers (e.g. Behaviour Driven Development)?

Are the testers manually regression testing every sprint? Could the developers help by automating more of that? Are the testers having to perform repetitive tasks to get to the part of the user-journey they’re actually testing? Can that be automated by the developers?

Is there anything else that is impacting them? Test data set-up and maintenance, product owner availability to answer questions? Anything else?

Addressing any of these issues is likely to speed up the testing process and increase throughput of the entire team as a result. One solution is to put these types of tasks onto the product backlog. This is fine but if we assign point values to them it can give a skewed view of velocity. Or rather, velocity is no longer a measure of throughput. You won’t be able to see if these types of tasks are actually improving things unless you are also measuring what proportion of the points are delivering new product capabilities.

The only good reason I can think of for story-pointing these throughput-enhancing tasks is if your focus is utilisation – i.e. maximising the number of story-points in progress. Personally, I care more about measuring and improving throughput. By doing so, we get the right utilisation for free and a faster, more capable team.

Up next: Valuing Effectiveness over Efficiency.

Footnotes:

[1] “Lessons Learned in Close Quarters Battle” Illustrates how stepping outside our job descriptions can move the team through each story more quickly by using special-forces room-clearing as an analogy.

[2] Little’s law (PDF) – the section “Evolution of Little’s Law in Operations Management” that references Hopp and Spearman’s observation about throughput (TH), work-in-progress (WIP) and cycle-time (CT) – i.e. TH=CT/WIP. And therefore we can say CT = WIP/TH.

[3] Little’s law also illustrates that if we reduce the work in progress to one quarter, then the cycle time for each story reduces to one quarter of a sprint. We’ll still get through 10 points per sprint, but stories will be completed more as a continuous stream throughout the sprint rather than all at the end.

[4] “The Multi-Tasking Myth” by Jeff Atwood, talks about multi-tasking across projects and pulls together several resources to illustrate the impact of multitasking at various levels. This applies when multi-tasking across stories.

Acknowledgements:

I’d like to say a special thank you to my fellow RiverGliders – Andy Palmer and James Martin – for the feedback that helped me refine this article.

This is not a manifesto

2012-02-04T10:09:39Z

Recently, I had cause to ponder the values I hold as a member of a software development team. Values that, alongside other values I hold, drive my choices and behaviours. They sit behind the things I do and how I do them. They underly my thinking when considering how we can improve as a team and as an individual contributing to that team – for example, during retrospectives. This is not a manifesto of what I will value. This is an articulation of the things I do value.

In what I do and how I seek to improve as a team member, I have found that I value:

Throughput over Utilisation
Effectiveness over Efficiency
Advancement over Speed
Quality over Quantity

While there is value in the items on the right, I care about the items on the left more.

Note, that I say I ‘care’ about the items on the left more. I actually do care. I know that I care about these things more because it actually annoys me if I’m being asked to focus on the items on the right (probably because I know that they will be at the expense of the items on the left). I also know because I actually have a positive feeling when I am being asked for the items on the left (probably because I know that, over time, we’ll get the items on the right for free).

So, what do I mean by these ‘buzzwords’. Stick around and I’ll tell you in a series of upcoming blogposts. Some of these apply in ways that go beyond the obvious.

Acknowledgements:

You probably noticed that I used the same format as the Agile Manifesto. This is because it happens to provide a structure that I believe most clearly and concisely communicates the idea.

I’d like to say a special thank you to Andy Palmer and Matt Roadnight for the awesome conversations that helped me find the right words to express these values and ideas.

Mission Critical Agility at NASA

2011-11-10T18:51:57Z

Yesterday at Better Software and Agile Development Practices East in Orlando, I enjoyed a great keynote talk from Dr Jeff Norris, of Nasa, on Mission Critical Agility. Among the things talked about was the decision to use a lunar orbit rendezvous method, rather than the direct ascent method.

One opportunity that was missed during the talk was to recognise that the Apollo programme was, in itself, an incremental and iterative project. So, NASA has been an Agile organisation for much longer than many might give them credit for.

It wasn’t until Apollo 11 that they landed someone on the moon. Each mission, from 1 to 11 (as well as the numerous missions prior to that not carrying the Apollo name), was a process of experimentation and gradual improvement towards the end goal.

Think of each Apollo mission as a ‘release’, each of which improved upon the previous. Following the tragedy of Apollo 1 (where 3 astronauts lost their life) subsequent Apollo missions were unmanned progressing back to manned flights. Incrementally, missions progressed to achieve earth orbit, then lunar orbit and finally, landing on the moon.

Learning from each mission was used to improve and refine the portion of the journey achieved so far and to guide the next increment.

I’m glad I caught Jeff afterwards to mention this. I get the impression that this point is going to find its way into future iterations of his talk.

Scenario-Oriented vs. Rules-Oriented Acceptance Criteria

2011-10-02T16:15:23Z

Acceptance Criteria, Scenarios, Acceptance Tests are, in my experience, often a source of confusion.

Such confusion results in questions like the one asked of Rachel Davies recently, i.e. “When to write story tests” (sometimes also known as “Acceptance Tests” or in BDD parlance “Scenarios”).

In her answer, Rachel highlighted that:

“…acceptance criteria and example scenarios are a bit like the chicken and the egg – it’s not always clear which comes first so iterate!”

In that article, Rachel distinguishes between acceptance criteria and example scenarios by reference to Liz Keogh’s blog post on the subject of “Acceptance Criteria vs. Scenarios”:

“where she explains that acceptance criteria are general rules covering system behaviour from which executable examples (Scenarios) can be derived“

Seeing the acceptance criteria as rules and the scenarios as something else, is one way of looking at it. There’s another way too…

Instead of Acceptance Criteria that are rules and Acceptance Tests that are scenarios… I often find it useful to arrive at Acceptance Criteria that are scenarios, of which the Acceptance Tests are just a more detailed expression.

I.e. Scenario-Oriented Acceptance Criteria.

Expressing the Intent of the Story

Many teams I encounter, especially newer ones, place higher importance on the things we label “acceptance criteria” than on the things we label “acceptance tests” or “scenarios”.

By this I mean that, such a team might ultimately determine whether the intent of the story was fulfilled by evaluating the product against these rules-oriented criteria. Worse still, I’ve observed some teams also have:

Overheads of checking that these rule-based criteria are fulfilled as well as the scenario-oriented acceptance tests
Teams working in silos where BAs take sole responsibility of criteria and testers take sole responsibility for scenarios
A desire to have traceability from the acceptance tests back to the acceptance criteria
Rules expressed as rules in two places – the criteria and the code
Criteria treated as the specification, rather than using specification by example

If we take the approach of not distinguishing between acceptance criteria and acceptance tests (i.e. see the acceptance tests as the acceptance criteria) then we can encourage teams away from these kinds of problems.

I’m not saying it solves the problem – it’s just easier, in my experience, to move the team towards collaboratively arriving at a shared understanding of the intent of the story this way.

To do this, we need to iteratively evolve our acceptance criteria from rules-oriented to scenario-oriented criteria.

Acceptance Criteria: from rules-oriented to scenario-oriented:

Let’s say we had this user story:

As a snapshot photographer

I want some photo albums to be private

So that I have a backup of my personal photos online

And let’s say the product owner first expresses the Acceptance Criteria as rules:

Must have a way to set the privacy of photo albums
Private albums must be visible to me
Private albums must not be visible to others

We might discuss those to identify scenarios (as vertical slices through the above criteria):

Create new private album
Make public album private
Make private album public

Here, many teams would retain both acceptance criteria and scenarios. That’s ok but I’d only do that if we collectively understood that the information from these two will come together in the acceptance tests… And that whatever we agree in those acceptance tests supersede the previously discussed criteria.

This understanding tends to occur in highly collaborative teams. In newer teams, where disciplines (Business Analysts, Testers, Developers) are working more independently this tends to be harder to achieve.

In those situations (and often even in highly collaborative teams) I’d continue the discussion to iterate over the rules-oriented criteria to arrive at scenario-oriented criteria, carrying over any information captured in the rules, discarding the original rules-based criteria as we go. This might leave me with the following scenarios:

Create new private album (visible to me, not to others)
Make public album private (visible to me, not to others)
Make private album public (visible to me, & others)

Which, with a subtle change to the wording, become the acceptance criteria. I.e. the story is complete when we:

Can create new private album (visible to me, not to others)
Can make public album private (visible to me, not to others)
Can make private album public (visible to me, & others)

Once the story is ‘in-play’ (e.g. during a time-box/iteration/sprint) I’d elaborate these one at a time, implementing just enough to get each one passing as I go. By doing this we might arrive at:

Scenario: Can Create new private album
When I create a new private album called "Weekend in Brighton"
Then I should be able to see the album
And others should not be able to see it

Scenario: Can Make public album private
Given there is a public album called "Friday Night"
When I make that album private
Then I should be able to see the album
And others should not be able to see it

Scenario: Can Make private album public
Given there is a private album called "Friday Night"
When I make that album public
Then I should be able to see the album
And others should be able to see it

By being an elaboration of the scenario-oriented acceptance criteria there’s no implied need to also check the implementation against the original ‘rules-oriented’ criteria.

Agreeing that this is how we’ll work, it encourages more collaborative working – at least between the Business Analysts and Testers.

These new scenario-based criteria become the only means of determining whether the intent of the story has been fulfilled, avoiding much of the confusion that can occur when we have rules-oriented acceptance criteria as well as separate acceptance test scenarios.

In short, more often than not, I find these things much easier when the criteria and the scenarios are essentially one and the same.

Why not give it a try?

Sushi, Hibachi and Other Ways of Serving Software Delicacies

2011-07-01T15:29:08Z

Let’s say you own a restaurant. Quite a large restaurant. You’ve hired a manager to run the place for you because you are about to take the fruits of your success and invest in opening three more around the country. You leave the manager with a budget to make any improvements he sees fit.

After being away for a few weeks, you return to your restaurant. It’s very busy thanks to the great reputation you had built for it. The kitchen has some new state of the art equipment and you can see that there are a lot of chefs and kitchen-staff preparing meals furiously. Obviously, with a restaurant this full you have to prepare a lot of meals.

Too many chefs?

But then, you notice that there are only a few service staff taking orders and delivering the meals to the customers. Tables are waiting a long time for their meals. You notice the service staff spending a lot of their time taking meals back to the kitchen because they have gone cold waiting so long to be served. You also overhear several customers complaining that the meal they received was not the one they ordered, increasing the demand for the preparation of more meals. You call the manager over and ask him what’s going on… he says “because we need to invest more in the kitchen”. You wonder why. He tells you “because there are so many meals to prepare and we can’t keep up”.

At this point, it’s quite obvious that the constraint is not through lack of investment in the kitchen. Instead the lack of investment is in taking the right orders and getting the meals to the customers. To solve this, I think most people would increase the investment in getting the order right and in how the meals get to the customer not by investing more into the kitchen.

Obvious, right? Despite this, time and again, I see organisations happy to grow their investment in more programmers writing more code and fixing more bugs and all but ignore the end-to-end process of finding out what people want and taking these needs through to working capabilities available to the customer. Business analysis, user-experience, testing and operations all seem to suffer in under investment. If it takes you four weeks to write some code for several new features and then another four weeks for that to become capabilities available to your customer, hiring more programmers to write more code is not going to make things any better.

Sushi and Hibachi

So, how would you advise the restaurant? In the short term, you can get some of the chefs to serve customers while you hire more service staff. Another solution is to invest in infrastructure that brings the meals closer to the customer, such as in hibachi restaurants where the chef takes the order and cooks the meal at the table or in sushi restaurants where meals are automatically transported via conveyor belt.

In software teams, we can bring the preparation of features closer to our customers. We can involve a cross-functional team in the conversation with the customer (such as the chef being at the customer’s table). We can also automate a large proportion of the manual effort of getting features from a programmer’s terminal into the hands of the customer (as in with the conveyor belt). And, there is so much we can automate.

Integration, scripted test execution, deployment and release management can largely be automated. Since speeding up these tasks has the potential to increase the throughput of an entire development organisation, it seems appropriate to invest in these activities as we would in automating any other significant business activity – such as stock-control or invoice processing.

Cooking for ourselves

Why not ask a good number of our programmers to take some time out from automating the rest of the business and automate more of our own part of the business? Taking this approach seems to be the exception rather than the rule. I see organisations having one or two dev-ops folks trying to create automated builds; and similar number of testers, with next to no programming experience, trying to automate a key part of the business – regression testing. And then asking them all to keep up with an army of developers churning out new code and testers churning out more scripted tests.

Serving up the capabilities

Instead, let’s stop looking so closely at one part of the process and look more at the whole system of getting from concept to capability. Let’s invest in automating for ourselves as much as we automate for others – and let’s dedicate some of our best people to doing so.

In short, it’s not just about the speed at which we cook up new features. It’s as much about the speed at which we can serve new, working capabilities to our customers.

To hack, or not to hack

2011-06-07T11:28:30Z

Recently, Engadget reported that Nokia had decided to abandon MeeGo in favour of Windows Mobile. What was interesting is the reported process used to make this decision:

Elop drew out what he knew about the plans for MeeGo on a whiteboard, with a different color marker for the products being developed, their target date for introduction, and the current levels of bugs in each product. Soon the whiteboard was filled with color, and the news was not good: At its current pace, Nokia was on track to introduce only three MeeGo-driven models before 2014-far too slow to keep the company in the game.

This brought to mind some blog posts I wrote in early 2010 (now copied to this blog – see previous two entries). I talked about ‘critical mass’…

Critical Mass of Software: the state of a software system when the cost of changing it (enhancement or correcting defects) is less economical than re-writing it.

In the case of Nokia and MeeGo, there isn’t enough information to know if that’s what had happened but it does suggest, at least, that it was more economical to choose an alternative third-party OS than to continue with MeeGo. Integrating Windows Mobile is going to come at some cost, of course, but clearly they decided it was less than sticking with their current strategy.

This is an example of modern markets demanding more sustainable, adaptive and agile product development cycles, and a company recognising this and acting on it.

It is also a demonstration of how an ever-growing cost of change (e.g. through the number of bugs consuming resources that would otherwise be adding new value) is really deferring of cost – i.e. technical debt.

There are many arguments for ‘prudent’ acceptance of technical debt… This story, I think, is an example of where it wasn’t obvious when that debt was no longer prudent until it was too late for MeeGo as a product.

And this is one of the problems with some lean concepts applied to new products and start-ups. The advice is often to get feedback from the market quickly by hacking the product together and release as quickly as possible. The trouble is, at what point do we take a step back and change to more sustainable practices? Probably around the time the product is selling… but then we’re under pressure from competitors who have seen our innovations and want to copy them… so we’re tempted to keep hacking out more features only to find that the faster we hack the slower we go.

There is no magic answer, to the question “when do we stop hacking and start sustainably building?” We might wait until we’re making enough money to hire more people to do remedial work.

As soon as we find we’re enhancing a feature, it might mean that it’s a key part of our product – one we’re going to keep. We might then treat any code we touch during that enhancement as legacy code, building in the cost of remedial work into the enhancement.

Some would argue that it’s faster overall to never hack things in and to commit to sustainable product development practices from the get-go. This probably applies in many cases but especially to large complex products like operating systems.

Whatever the strategy used, there will come a point where you can’t hack in any more features. The question is, will you decide when that happens or will the cost of change decide for you.

Old Favourite – More Sharks and Delaying Critical Mass

2011-06-07T10:33:39Z

This article originally featured on my old blog on 19th January 2010.

In a previous post I talked about Critical Mass of software. I showed how an ever-increasing cost of change resulted in it becoming more economical to completely rewrite the system than to enhance and maintain the original.

I explained how this could be avoided by using practices that sustain a consistent and flat cost of change. I also mentioned that you could defer reaching critical mass. Some teams find it difficult to get the time to do this because “the business” always has “more important” or “higher-value” things on their backlog.

What are the implications of reaching critical mass? Well, depending on what the software does and whether the rewritten version still has to do all of those things, it could cost millions… or more.

Presented with the situation that the product could reach critical mass within a year – costing millions to replace – do you think “the business” would start to think it is worth investing in reducing the cost of change? Obviously, I would advise teams to avoid this situation in the first place:

The reality for many teams I’ve encountered is that they don’t feel empowered to push back on the business’ demands for that next feature in half the time it takes to do it properly. If we could present back the impact of that choice as shortening the time to Critical Mass and bringing about costs to replace the software far in excess of the value gained by delivering that feature a month early then perhaps the business would be better informed.

A big visible chart on the wall, showing the estimated point at which Critical Mass was reached could play a big role in getting the business more interested in sustainable change, or at least inspire a conversation on the subject.

Convincing “the business” to invest in some remedial refactoring or allowing three times as long for each feature to facilitate enough remedial refactoring for each new feature and refactoring-as-we-go for the new feature might be easier if we could represent this idea visually:

The problem with this idea is how do we credibly determine when Critical Mass is going to be reached? Many experienced practitioners could probably reasonably accurately estimate when that was going to happen purely on gut feel but this would be torn to pieces by many product managers. I’ve not solved this problem yet because doing this would require a mathematical model, determined by empirical data.

Perhaps someone will take inspiration from the ideas in these blog posts and find a solution. Perhaps someone has already done this or maybe this is an idea for a future PhD I might undertake? Maybe someone else will take inspiration from this article and undertake that work before I do. If so, they’re welcome to (with due attribution where applicable of course

I think it is possible, however, to come up with a simple model based on the average complexity per unit of value (assuming that the team is using value and complexity). Trending this and keeping an estimate of a complete re-write up to date might allow the simple charts above to be maintained along side release-level burn-down charts.

These ideas are still in their infancy for me. Has this problem been solved already? If not, I encourage others to explore my hypothesis and help take it from just an interesting idea to something more useful.

Old Favourite – Sharks, Debts, Critical Mass and other reasons to Sustain Quality

2011-06-07T10:29:17Z

This article originally featured on my old blog on 18th January 2010.

A while back I tweeted about critical mass of software:

Critical Mass of Code – past which the changeability of the code is infeasible, requiring that it be completely rewritten.

An elaboration of this might be:

Critical Mass of Software: the state of a software system when the cost of changing it (enhancement or correcting defects) is less economical than re-writing it.

This graph illustrates a hypothetical project where the cost of change increases over time (the shape of which reminds me of a thresher shark):

Note:The cost of a rewrite gradually grows at first to account for the delay between starting the re-write and achieving the same amount of functionality in the original version of the software at the same point in time. The gap between the cost-of-change and the cost-of-rewrite begins to decrease as the time it takes to implement each feature of comparable benefit in the original version grows.

It was just one of those thoughts that popped into my head. I soon discovered that critical mass of software is not a new idea.

One opportunity that I feel organisations miss out on is that this can be used as a financial justification for repaying technical debt, thus preventing or delaying the point at which software reaches its Critical Mass, or avoiding it altogether.

Maintaining quality combined with practices that keep the effort involved in completing each feature from growing over time can defer critical mass indefinitely:

Note: The cost of a rewrite is slightly less at first assuming that we are doing a little extra work to ensure that we get the infrastructure we need up and running to support sustainable change (e.g. continuous integration etc.)

This flat cost of change curve is one of the motivations of many agile practices and software craftsmanship techniques, such as continuous integration, test-driven-development, behaviour driven development and the continuous refactoring inherent in the latter two. A flat cost-of-change trend is good for the business, as they have more predictable expenditure.

The behaviour I’ve noticed with some teams is to knowingly accrue technical debt to shorten time to a near-future delivery in order to capitalise on an opportunity and then pay it back soon after:

This certainly defers reaching critical mass but it doesn’t prevent it.

Unfortunately, I see more teams taking on technical debt and not repaying it (for various reasons, often blamed on “the business” putting pressure on delivery dates).

If we had a way of estimating the point at which we’d reach critical mass, we’d be able to compare the benefit of repaying technical debt now in order to avoid or delay the cost of a rewrite… or even better, demonstrate the value of achieving a flatter cost of change.

I’m not sure that we have the data or the means of predicting it, but perhaps these illustrations of a hypothetical scenario will help you explain why sustaining constant quality, automating repeatable tests (or checking as some prefer to call it) wherever technically possible and, on those occasions when we do need to make a conscious choice to compromise quality to capitalise on an opportunity, that the debt is repaid soon after.

One company I know reached this point a couple of years ago and embarked on a department wide transformation to grow their skills in agile methods so that they not only re-wrote the software but could also avoid ever reaching critical mass again. A bold move, but one that paid off within a year. Even for them, they’ll need to avoid complacency as time goes on and not forget the lessons of the past. Let’s hope they do.