Cutting your losses

Originally published at Lux Group Tech Blog

Back in Feb 2017 I wrote about our decision to rebuild our entire architecture. It is established lore among the software community that re-writing software is something you should never do. This is not only because of Joel Spolsky’s excessive influence in software-land but because so many software engineers have experience of hating the system they work on, only to find that when they re-wrote it, they simply didn’t make it that much better.

Screenshot of small part of the legacy Back Office portal: a dazzling array of tabs, icons and drop down lists, most of which are irrelevant to the task at hand

There are some well-established ways of avoiding the re-write from some of the luminaries of the software world but sometimes you have to come to the conclusion that there’s nothing worth saving. At the Lux Group we were faced with a legacy platform with no discernible architecture, no automated tests, an outsourced engineering team and an overwhelming level of tech debt; we saw no real alternative and suggested the following justifications for doing a strategic rebuild (and doing it properly):

successful rebuild is normally undertaken under the following conditions:

* The company has gained enough experience to understand its business domain, the customers it is serving and the product that it wants to offer.

* The company is well-resourced, able to invest in more experienced engineers and to invest those engineers in building a product or platform for the future.

* The company stakeholders understand that building a product with solid engineering principles, built-in checks and balances and a high-performing team to run this software takes more time and tends to cost more than doing Rapid Application Development (RAD) in the same way that building and maintaining an architect designed home tends to be far more expensive to build and maintain than buying a kit home and renovating it yourself.

* The stakeholders understand that Software Engineering is Expensive and building things to last makes this even more expensive, therefore the company and team needs to be very selective around what features they opt to build.

It is now over a year later and the product luxuryescapes.com has been released without salvaging a single line of code from the original system (CRM aside). We release code every day, fix bugs before writing new features and the small amount of tech debt we have is under control. In retrospect we can see that the rebuild project could have easily spiralled out of control, the following were some of the principles we applied to ensure it didn’t.

Strong Product Team

We have this policy within our team: “if a feature doesn’t make sense to you, don’t do it”. This might sound blindingly obvious, but engineers and designers are often asked to work on stories and features that don’t make any sense to them. They cannot see why the customers would want that feature or how it benefits the product or the business. A strong Product Team can challenge senior stakeholders and sponsors to get to the root of the problem or design a holistic solution rather than applying band-aid after band-aid. An empowered engineer can expect a good explanation of the strategy under-pinning any features they are asked to deliver.

Architecture without an end state

In architecture without an end state, Michael Nygard describes how we should build architectures that are designed to be ever-changing along with the personnel and direction of the business. Lux Group is an ambitious company; by deciding to use micro-services we embraced the dynamic nature of the business and the ability to scale teams and iterate quickly. We embraced the challenges they created (reporting and atomic transactions), and avoided reverting to a monolith design when faced with these challenges.

When we sold some of our businesses, bought others and restructured the company and our business model, we had an architecture, team and process that was setup to embrace this change.

Iterative development and Continuous Delivery

From the first month of development we created an MVP that we treated as a production environment even though it was not live to the public. This MVP contained the ‘spine’ of the product (simple implementation of critical features like: search, add to cart, purchase, pay vendor). We only showcased software that had been released to that environment, and we released aggressively, with a Continuous Deployment approach. We treated every feature as an MVP, releasing early and iterating. When we failed, we demanded an RCA. When we finally launched the product, we already had the processes and discipline in place to continue the same practise.

Screenshot of Back Office portal on new platform: data is cleanly presented with consideration given to UI

Transparency

Every month we showed progress to the sponsors and we were honest about our setbacks and failures. By being honest, the delivery team did not get caught behind a lie and the sponsors saw the true rate of progress and an opportunity to consider our options: scope, time, resources.

Challenge every requirement

When you do a rebuild you get a fresh start. The conventional wisdom suggests you’re going to end up rebuilding every feature you already had so stick with what you’ve got — but this was not the case for us. We had many features that had been built over many years and were no longer that relevant. We were keen to highlight that every feature has a cost of ownership: paying an engineering team for build and maintenance as well as ongoing operational costs. Often the cost of ownership does not justify the profit that will be made from this feature. By costing and challenging every requirement you can help keep your product tight and the business focused on what it really needs. This is beautifully explained in one of my favourite blog posts The Tax of New.

What could we have done better?

We thought we had an elegant plan for solving the inevitable reporting problem that comes from having the distributed data of a microservice architecture. It turned out this plan didn’t work and we had to scramble a tactical solution while we learnt a lot more about pub/sub event-sourcing and re-constituting distributed data. If you do embark on microservices, ensure you solve this problem as part of your early MVP.

Would I do it again?

We can feel confident that no one at the company wishes we’d tried to incrementally work with what we had. We now have in place a high-performing team with a great product and platform that is being extended to build the next chapter of the Lux Group’s story. Joel Spolsky and Martin Fowler are probably correct in most cases; if you can salvage parts of your system while you rewrite others, you probably should. However, some systems are unsalvageable so don’t be afraid to start again from scratch.

Advertisements

Should we be building software?

Originally published as Journey of a Tech Stack on Lux Group’s technical blog

1_u0YNd-zYOZw3DwukKSh91A

A painting of the Tower of Babel under construction

Should we be building software?

Software engineering is difficult and expensive: this is well understood and is reflected in the commoditisation of software into hugely scalable packages (usually cloud-based) that allow us to achieve our business goals without actually writing any code.

Just 10 years ago if you wanted an e-commerce site you had to hire a developer and purchase hardware to run and manage your site. Similarly if you wanted to launch a brochure site to promote your business or a blog to publish your thoughts or promote your business you hired a developer; soon enough you began to realise how expensive software development was. It’s never set and forget; standards change, fashions change, sites need ongoing maintenance, development and redesigns.

Nowadays if you want a blog you go to Tumblr or Medium, if you want an ecommerce site you go to Shopify, BigCommerce or Magento and if you want a brochure site you go to Squarespace or WordPress; most companies should avoid writing custom bespoke software unless really necessary.

So, who does need to hire developers (or software engineers as we prefer to be called these days)? The answer to this is companies or startups that are trying to solve a new problem — that is, one that hasn’t yet been fully commoditised — hire engineers, QAs, product owners, visual designers and user experience experts to create new software that fills a need or solves a problem for their users.

What is the better way to build software?

One of the most popular contemporary approaches to building software for startups is the Lean Startup method which involves building the smallest increment of business value possible and validating that within the market. This is a relatively modern approach that can be used in companies of all sizes but also happens to be the approach that small startups on restricted budgets often do by default and have done for many years before Eric Ries had even dreamed of the term “Lean Startup”.

Phase I: The budget startup

At its genesis, a company normally has serious budget constraints and takes the fastest route to building software and delivering value to its customers. Here, at Lux Group, this is exactly how our initial technology stack was built: using PHP (the swiss-army knife of development languages), outsourced engineers and rapid application development, delivering features as quickly as possible, validating them in the marketplace and then moving on to the next feature. Sales figures were the priority, performance and user experience followed closely behind with technical integrity and system design for an uncertain and ever-changing future well down the list.

In software engineering, Software Entropy increases over time and, unless significant time and resources are invested into maintenance, systems eventually become harder to understand and maintain. If a company is lucky enough to still be in business by this time they may reach Phase II.

Phase II: The strategic rebuild

The world is littered with the corpses of companies that never made it to Phase II. Many of these companies may have over-engineered their tech stack, over-complicating a solution to a problem they did not yet fully understand. Indeed many of the companies that have been purchased by the Lux Group such as Brands Exclusive, Living Social and Pinchme were far more ambitious with their technical solutions than the Lux Group engineers, and yet were unable to compete in the market while a simple, stripped-down, lean approach suited us, allowing us to develop quickly, absorb these companies and move on to the next challenge.

Twitter famously rebuilt its platform after discovering that the original Ruby on Rails implementation was unable to cope with their traffic and replaced it with a queue based system that served their purpose better. Facebook took a different approach, creating an extremely sophisticated low-level solution to improve the speed of their php code without requiring a full rewrite of their application logic.

A strategic rebuild normally comes when a company is in a strong position. To some dominant companies this market strength can arguably be a disadvantage — in the 9 years I worked in Westfield’s Digital Labs we pivoted to a different business strategy roughly every 2 years, often requiring a complete change in stack, programming language and personnel. In a less well-resourced company this wouldn’t have even been a possibility; Westfield Labs is still actively experimenting with Westfield’s role in the digital world.

rebuild is normally undertaken when the software is unable to provide the return on investment it has previously and the company knows that significant investment is required to return the software to a healthy state. When technical debt grows high enough, a system becomes technically bankrupt and unable to deliver the features growth necessary to implement the business strategy.

successful rebuild is normally undertaken under the following conditions:

  • The company has gained enough experience to understand its business domain, the customers it is serving and the product that it wants to offer.
  • The company is well-resourced, able to invest in more experienced engineers and to invest those engineers in building a product or platform for the future.
  • The company stakeholders understand that building a product with solid engineering principles, built-in checks and balances and a high-performing team to run this software takes more time and tends to cost more than doing Rapid Application Development (RAD) in the same way that building and maintaining an architect designed home tends to be far more expensive to build and maintain than buying a kit home and renovating it yourself.
  • The stakeholders understand that Software Engineering is Expensive and building things to last makes this even more expensive, therefore the company (and team) needs to be very selective around what features they opt to build.

If these principles are understood and adhered to, a strategic rebuild will be far more cost-effective over the long-term than a rapid application approach as the system will be designed to grow and change over time, hence allowing the company to grow and change with it, protecting the company and its customers from the shock and upset of prolonged downtime while the system is rebuilt again.

Continuous Delivery

So far we have talked about rapid application development on a budget and more strategic development at a higher cost. Recent developments in cloud computing, virtualisation and automation have led to companies being able to iterate rapidly, without compromising quality, security or stability.

How?

By breaking applications into smaller components, writing extensive tests and investing in automation, companies are able to release changes to production many times a day, while managing risk to the stability of the system.

Overheads

This approach comes at a cost: there is a significant overhead to writing tests for the features you deliver. However, not writing the tests means that you are not managing your risk and are unable to guarantee stability. The division of applications into smaller components means these components often make the path to retrieving data more convoluted which, in turn, makes development of these features slower.

High-performing team

How do you maintain your ROI given these increased overheads?

Having a more complex and sophisticated architecture means you require a more sophisticated team to run it. Features are expensive to build so we should not waste our time building features of unproven benefit: Pareto’s Principle states that 80% of the value comes from 20% of the features. Changes should be smaller and incremental, with each change forming an experiment upon your customers and your team having the ability to gauge the success of that test and pursue the most promising route — the simplest example of this are the A/B tests that many companies run.

The delivery team needs to have all the components necessary for delivery: planning, design, user experience, engineering, operations etc all working together.

Solve problems rather than build features

Most importantly the delivery team needs to become expert at solving problems and the stakeholders or management team need to be able to give the delivery team the autonomy to solve these problems rather than ask the team to deliver ‘features’.

So, should we be re-building software?

Joel Spolsky describes rewriting software from scratch as the worst thing you can ever do and I agree that in most scenarios, this indeed is the case. However, sometimes there are exceptions: here at Lux Group we have an opportunity to work on rebuilding a subset of features that simplifies and changes our business requirements and we believe we satisfy the criteria outlined above to enable a successful build that hugely improves our capacity to deliver value to the business.

Should we be doing planning poker?

I got a message from an old colleague today with a link to a wiki page on Poker Planning and a question: “Why didn’t we do this?”

My initial reaction was that we did, we did it for years but after consideration I realised he was right – by the time he’d started working with us we had stopped using poker planning to estimate – or at least we only used it when we really needed it. So why stop? Did we lose our discipline?

The Scrum approach

Poker planning is a great way of using the wisdom of crowds to estimate stories. Points rather than days are normally used to keep the estimation abstract and these points are then tracked to calculate a team’s velocity to allow for capacity planning. In our early days of agile experimentation we followed the Scrum dogma quite rigidly – the team would gather together and examine features and help break them down to stories and then use planning poker to come up with an estimate, then the Product Owner would come along and select the stories he wanted the team to work on. More than once he commented on how he felt he was given the ‘illusion of choice’ whereby due to story size, team capacity and soft dependencies his options to select and order the backlog were severely limited.

This approach lead to us having a ‘well-groomed’ backlog but many of the groomed and estimated stories would languish in the backlog for a long period of time as small tokens of time wasted; growing stale and decreasing the signal to noise ratio. Furthermore, production bugs sat in their own backlog, prioritised separately and largely ignored by a product owner focused on building ‘the new’ and a team attempting to maintain its velocity.

A Product Owner’s role is to generate the maximum ROI through picking the stories which return the most business value for the amount of effort expended, however, the reality is often far more subtle than this; while each feature or story is technically an independent release of business value, more often than not each feature and story is part of a much bigger picture and really needs to be played in a specific order that the team understands. Backlog grooming, planning poker and sprint-stacking can become a transactional (rather than collaborative) affair leading to a poorly maintained and planned product.

Continuous deployment and a high-performing team

By the time the aforementioned colleague had joined the team we had restructured our approach. The Product Owner was embedded in the team, story and feature selection had become collaborative; there was no formal backlog grooming and commitments were agreed by small teams based upon the entire team having an in-depth understanding of both the business benefits of each feature and the estimated effort.
The roadmap was understood by the whole team and any up and coming work had either been spiked already or been analysed by the engineers and collaborated on (with UX and BA’s as necessary).

Furthermore, specialisation was respected and ownership of components encouraged: while we all love full-stack engineers, in a small team the reality is you may only have one real JS expert, one back-end expert and one HTML/CSS expert. Working on a maximum of one week iterations and with your definition of done as “released to production”, the abstraction of poker-planning-driven velocity becomes irrelevant – it’s easier for the team to work out what they can achieve in that sprint that week, estimating by days if necessary. With the onus on quality and user experience, fixing production bugs is treated as a priority and will often need to be prioritised at short notice which will impact deliverables so focusing on hard, velocity-based commitments can encourage a compromise on product quality.

So why did we stop Poker Planning?

When a product is well-established and development is lean and iterative, the team should be in control of its own destiny, understanding the vision and goals of the business and driving towards those goals with a roadmap for guidance if necessary. With the product owner embedded as part of the team and in constant collaboration with his team members the need for poker and velocity based capacity planning becomes irrelevant and the one important question becomes: “What can you (or even better, ‘we’) commit to delivering to production this week”.

With good stakeholders (or good stakeholder management) the sponsors will ask “what goals or improvements have we achieved” rather than “how many features did we complete”. With code being released to production on a weekly (or daily) basis the transparency that this approach delivers encourages trust between stakeholders, product owners and the delivery team, and negates the need for the transactional approach of poker planning, sprint stacking and velocity tracking.

SOA: An enabler for Continuous Delivery and innovation

Building on my experience at Westfield Labs, this presentation was delivered to Sydney CTO Summit and explores how implementing a Service Oriented Architecture allowed Westfield Labs to embrace a Lean, Agile approach to Product delivery.

The presentation covers how an SOA assisted with:

  • Management of backlog, particularly bugs
  • Managing build times
  • Cross-functional teams
  • Faster iterations
  • The ‘QA Paradox’ of better quality through reducing testers

User stories and asking “Five Whys”

We are currently coaching a new team into the ways of Agile and one of the problems we’ve encountered is getting the devs to write cards or stories using the standard Agile story formats. To Agile newbies the syntactic sugar that surrounds a story’s details often seems like a waste of time. i.e. A developer knows exactly what he means when he writes:

“Add currency code to Data Warehouse views”

and feels like he is being made to jump through hoops to turn that into:

“In order to differentiate between international sales, we need to update the data warehouse transactional views to show the currency code, so that they can report on this data.”

The reason we favour this (or the “As a [user]…” format) on agile projects is that the story describes what needs to be done and why. This means that any member of the team can understand exactly why a story has been added to the backlog and doesn’t need to get an explanation from the person that wrote the story or drill down into the acceptance criteria to discover this.

The other benefit of this format is that it forces the person writing the story to find out exactly why the requester wants that story. Indeed, in drilling down to the real requirement, the analyst may discover that the real business requirement is not what the requester is asking for at all.

Aslak Hellesoy (the creator of Cucumber) illustrates all this perfectly in the cucumber documentation where he describes the process of asking the Five Why’s to discover the underlying requirements – the why in the story.

(shamelessy copied directly from the cucumber wiki for your convenience..

[5:08pm] Luis_Byclosure: I’m having problems applying the “5 Why” rule, to the feature
“login” (imagine an application like youtube)
[5:08pm] Luis_Byclosure: how do you explain the business value of the feature “login”?
[5:09pm] Luis_Byclosure: In order to be recognized among other people, I want to login
in the application (?)
[5:09pm] Luis_Byclosure: why do I want to be recognized among other people?
[5:11pm] aslakhellesoy: Why do people have to log in?
[5:12pm] Luis_Byclosure: I dunno… why?
[5:12pm] aslakhellesoy: I’m asking you
[5:13pm] aslakhellesoy: Why have you decided login is needed?
[5:13pm] Luis_Byclosure: identify users
[5:14pm] aslakhellesoy: Why do you have to identify users?
[5:14pm] Luis_Byclosure: maybe because people like to know who is
publishing what
[5:15pm] aslakhellesoy: Why would anyone want to know who’s publishing what?
[5:17pm] Luis_Byclosure: because if people feel that that content belongs
to someone, then the content is trustworthy
[5:17pm] aslakhellesoy: Why does content have to appear trustworthy?
[5:20pm] Luis_Byclosure: Trustworthy makes people interested in the content and
consequently in the website
[5:20pm] Luis_Byclosure: Why do I want to get people interested in the website?
[5:20pm] aslakhellesoy: 🙂
[5:21pm] aslakhellesoy: Are you selling something there? Or is it just for fun?
[5:21pm] Luis_Byclosure: Because more traffic means more money in ads
[5:21pm] aslakhellesoy: There you go!
[5:22pm] Luis_Byclosure: Why do I want to get more money in ads? Because I want to increase
de revenues.
[5:22pm] Luis_Byclosure: And this is the end, right?
[5:23pm] aslakhellesoy: In order to drive more people to the website and earn more admoney,
authors should have to login,
so that the content can be displayed with the author and appear
more trustworthy.
[5:23pm] aslakhellesoy: Does that make any sense?
[5:25pm] Luis_Byclosure: Yes, I think so
[5:26pm] aslakhellesoy: It’s easier when you have someone clueless (like me) to ask the
stupid why questions
[5:26pm] aslakhellesoy: Now I know why you want login
[5:26pm] Luis_Byclosure: but it is difficult to find the reason for everything
[5:26pm] aslakhellesoy: And if I was the customer I am in better shape to prioritise this
feature among others
[5:29pm] Luis_Byclosure: true!

https://github.com/cucumber/cucumber/wiki/

Valve Handbook

Employee Handbook describing Valve’s completely flat management structure. Not exactly a methodology that you could apply to a big shareholder owned corporate but a great example of how successful you can be if you hire good people and allow them to get on with their job.

Scaling a development team

In my job at westfield.com we’ve been dealing with the challenge of scaling a development team for years and are yet to get it anywhere near optimum.
This is great post on scaling a team from a company that should know all about it – Heroku.

http://adam.heroku.com/past/2011/4/28/scaling_a_development_team/

This is a great way to think about scaling and well worth a read although I think that most of the challenges one will meet scaling a team are not answered by this post – i.e. exactly how and along what lines you break up your teams. It’s interesting to see that the way Heroku broke up their teams was along horizontal layers rather than through functional areas but this probably reflects the highly technical nature of Heroku’s business.

We are currently undertaking a team restructure and I have been convinced for some time that to minimise friction, distraction and duplication, our teams at westfield.com need to be structured based on vertical slices of business functionality; allowing teams to own certain areas and functions of the site and business.

I’ll post again with an update when the teams are restructured and the results are in..