Cutting your losses

Originally published at Lux Group Tech Blog

Back in Feb 2017 I wrote about our decision to rebuild our entire architecture. It is established lore among the software community that re-writing software is something you should never do. This is not only because of Joel Spolsky’s excessive influence in software-land but because so many software engineers have experience of hating the system they work on, only to find that when they re-wrote it, they simply didn’t make it that much better.

Screenshot of small part of the legacy Back Office portal: a dazzling array of tabs, icons and drop down lists, most of which are irrelevant to the task at hand

There are some well-established ways of avoiding the re-write from some of the luminaries of the software world but sometimes you have to come to the conclusion that there’s nothing worth saving. At the Lux Group we were faced with a legacy platform with no discernible architecture, no automated tests, an outsourced engineering team and an overwhelming level of tech debt; we saw no real alternative and suggested the following justifications for doing a strategic rebuild (and doing it properly):

successful rebuild is normally undertaken under the following conditions:

* The company has gained enough experience to understand its business domain, the customers it is serving and the product that it wants to offer.

* The company is well-resourced, able to invest in more experienced engineers and to invest those engineers in building a product or platform for the future.

* The company stakeholders understand that building a product with solid engineering principles, built-in checks and balances and a high-performing team to run this software takes more time and tends to cost more than doing Rapid Application Development (RAD) in the same way that building and maintaining an architect designed home tends to be far more expensive to build and maintain than buying a kit home and renovating it yourself.

* The stakeholders understand that Software Engineering is Expensive and building things to last makes this even more expensive, therefore the company and team needs to be very selective around what features they opt to build.

It is now over a year later and the product luxuryescapes.com has been released without salvaging a single line of code from the original system (CRM aside). We release code every day, fix bugs before writing new features and the small amount of tech debt we have is under control. In retrospect we can see that the rebuild project could have easily spiralled out of control, the following were some of the principles we applied to ensure it didn’t.

Strong Product Team

We have this policy within our team: “if a feature doesn’t make sense to you, don’t do it”. This might sound blindingly obvious, but engineers and designers are often asked to work on stories and features that don’t make any sense to them. They cannot see why the customers would want that feature or how it benefits the product or the business. A strong Product Team can challenge senior stakeholders and sponsors to get to the root of the problem or design a holistic solution rather than applying band-aid after band-aid. An empowered engineer can expect a good explanation of the strategy under-pinning any features they are asked to deliver.

Architecture without an end state

In architecture without an end state, Michael Nygard describes how we should build architectures that are designed to be ever-changing along with the personnel and direction of the business. Lux Group is an ambitious company; by deciding to use micro-services we embraced the dynamic nature of the business and the ability to scale teams and iterate quickly. We embraced the challenges they created (reporting and atomic transactions), and avoided reverting to a monolith design when faced with these challenges.

When we sold some of our businesses, bought others and restructured the company and our business model, we had an architecture, team and process that was setup to embrace this change.

Iterative development and Continuous Delivery

From the first month of development we created an MVP that we treated as a production environment even though it was not live to the public. This MVP contained the ‘spine’ of the product (simple implementation of critical features like: search, add to cart, purchase, pay vendor). We only showcased software that had been released to that environment, and we released aggressively, with a Continuous Deployment approach. We treated every feature as an MVP, releasing early and iterating. When we failed, we demanded an RCA. When we finally launched the product, we already had the processes and discipline in place to continue the same practise.

Screenshot of Back Office portal on new platform: data is cleanly presented with consideration given to UI

Transparency

Every month we showed progress to the sponsors and we were honest about our setbacks and failures. By being honest, the delivery team did not get caught behind a lie and the sponsors saw the true rate of progress and an opportunity to consider our options: scope, time, resources.

Challenge every requirement

When you do a rebuild you get a fresh start. The conventional wisdom suggests you’re going to end up rebuilding every feature you already had so stick with what you’ve got — but this was not the case for us. We had many features that had been built over many years and were no longer that relevant. We were keen to highlight that every feature has a cost of ownership: paying an engineering team for build and maintenance as well as ongoing operational costs. Often the cost of ownership does not justify the profit that will be made from this feature. By costing and challenging every requirement you can help keep your product tight and the business focused on what it really needs. This is beautifully explained in one of my favourite blog posts The Tax of New.

What could we have done better?

We thought we had an elegant plan for solving the inevitable reporting problem that comes from having the distributed data of a microservice architecture. It turned out this plan didn’t work and we had to scramble a tactical solution while we learnt a lot more about pub/sub event-sourcing and re-constituting distributed data. If you do embark on microservices, ensure you solve this problem as part of your early MVP.

Would I do it again?

We can feel confident that no one at the company wishes we’d tried to incrementally work with what we had. We now have in place a high-performing team with a great product and platform that is being extended to build the next chapter of the Lux Group’s story. Joel Spolsky and Martin Fowler are probably correct in most cases; if you can salvage parts of your system while you rewrite others, you probably should. However, some systems are unsalvageable so don’t be afraid to start again from scratch.

Advertisement

Should we be building software?

Originally published as Journey of a Tech Stack on Lux Group’s technical blog

1_u0YNd-zYOZw3DwukKSh91A

A painting of the Tower of Babel under construction

Should we be building software?

Software engineering is difficult and expensive: this is well understood and is reflected in the commoditisation of software into hugely scalable packages (usually cloud-based) that allow us to achieve our business goals without actually writing any code.

Just 10 years ago if you wanted an e-commerce site you had to hire a developer and purchase hardware to run and manage your site. Similarly if you wanted to launch a brochure site to promote your business or a blog to publish your thoughts or promote your business you hired a developer; soon enough you began to realise how expensive software development was. It’s never set and forget; standards change, fashions change, sites need ongoing maintenance, development and redesigns.

Nowadays if you want a blog you go to Tumblr or Medium, if you want an ecommerce site you go to Shopify, BigCommerce or Magento and if you want a brochure site you go to Squarespace or WordPress; most companies should avoid writing custom bespoke software unless really necessary.

So, who does need to hire developers (or software engineers as we prefer to be called these days)? The answer to this is companies or startups that are trying to solve a new problem — that is, one that hasn’t yet been fully commoditised — hire engineers, QAs, product owners, visual designers and user experience experts to create new software that fills a need or solves a problem for their users.

What is the better way to build software?

One of the most popular contemporary approaches to building software for startups is the Lean Startup method which involves building the smallest increment of business value possible and validating that within the market. This is a relatively modern approach that can be used in companies of all sizes but also happens to be the approach that small startups on restricted budgets often do by default and have done for many years before Eric Ries had even dreamed of the term “Lean Startup”.

Phase I: The budget startup

At its genesis, a company normally has serious budget constraints and takes the fastest route to building software and delivering value to its customers. Here, at Lux Group, this is exactly how our initial technology stack was built: using PHP (the swiss-army knife of development languages), outsourced engineers and rapid application development, delivering features as quickly as possible, validating them in the marketplace and then moving on to the next feature. Sales figures were the priority, performance and user experience followed closely behind with technical integrity and system design for an uncertain and ever-changing future well down the list.

In software engineering, Software Entropy increases over time and, unless significant time and resources are invested into maintenance, systems eventually become harder to understand and maintain. If a company is lucky enough to still be in business by this time they may reach Phase II.

Phase II: The strategic rebuild

The world is littered with the corpses of companies that never made it to Phase II. Many of these companies may have over-engineered their tech stack, over-complicating a solution to a problem they did not yet fully understand. Indeed many of the companies that have been purchased by the Lux Group such as Brands Exclusive, Living Social and Pinchme were far more ambitious with their technical solutions than the Lux Group engineers, and yet were unable to compete in the market while a simple, stripped-down, lean approach suited us, allowing us to develop quickly, absorb these companies and move on to the next challenge.

Twitter famously rebuilt its platform after discovering that the original Ruby on Rails implementation was unable to cope with their traffic and replaced it with a queue based system that served their purpose better. Facebook took a different approach, creating an extremely sophisticated low-level solution to improve the speed of their php code without requiring a full rewrite of their application logic.

A strategic rebuild normally comes when a company is in a strong position. To some dominant companies this market strength can arguably be a disadvantage — in the 9 years I worked in Westfield’s Digital Labs we pivoted to a different business strategy roughly every 2 years, often requiring a complete change in stack, programming language and personnel. In a less well-resourced company this wouldn’t have even been a possibility; Westfield Labs is still actively experimenting with Westfield’s role in the digital world.

rebuild is normally undertaken when the software is unable to provide the return on investment it has previously and the company knows that significant investment is required to return the software to a healthy state. When technical debt grows high enough, a system becomes technically bankrupt and unable to deliver the features growth necessary to implement the business strategy.

successful rebuild is normally undertaken under the following conditions:

  • The company has gained enough experience to understand its business domain, the customers it is serving and the product that it wants to offer.
  • The company is well-resourced, able to invest in more experienced engineers and to invest those engineers in building a product or platform for the future.
  • The company stakeholders understand that building a product with solid engineering principles, built-in checks and balances and a high-performing team to run this software takes more time and tends to cost more than doing Rapid Application Development (RAD) in the same way that building and maintaining an architect designed home tends to be far more expensive to build and maintain than buying a kit home and renovating it yourself.
  • The stakeholders understand that Software Engineering is Expensive and building things to last makes this even more expensive, therefore the company (and team) needs to be very selective around what features they opt to build.

If these principles are understood and adhered to, a strategic rebuild will be far more cost-effective over the long-term than a rapid application approach as the system will be designed to grow and change over time, hence allowing the company to grow and change with it, protecting the company and its customers from the shock and upset of prolonged downtime while the system is rebuilt again.

Continuous Delivery

So far we have talked about rapid application development on a budget and more strategic development at a higher cost. Recent developments in cloud computing, virtualisation and automation have led to companies being able to iterate rapidly, without compromising quality, security or stability.

How?

By breaking applications into smaller components, writing extensive tests and investing in automation, companies are able to release changes to production many times a day, while managing risk to the stability of the system.

Overheads

This approach comes at a cost: there is a significant overhead to writing tests for the features you deliver. However, not writing the tests means that you are not managing your risk and are unable to guarantee stability. The division of applications into smaller components means these components often make the path to retrieving data more convoluted which, in turn, makes development of these features slower.

High-performing team

How do you maintain your ROI given these increased overheads?

Having a more complex and sophisticated architecture means you require a more sophisticated team to run it. Features are expensive to build so we should not waste our time building features of unproven benefit: Pareto’s Principle states that 80% of the value comes from 20% of the features. Changes should be smaller and incremental, with each change forming an experiment upon your customers and your team having the ability to gauge the success of that test and pursue the most promising route — the simplest example of this are the A/B tests that many companies run.

The delivery team needs to have all the components necessary for delivery: planning, design, user experience, engineering, operations etc all working together.

Solve problems rather than build features

Most importantly the delivery team needs to become expert at solving problems and the stakeholders or management team need to be able to give the delivery team the autonomy to solve these problems rather than ask the team to deliver ‘features’.

So, should we be re-building software?

Joel Spolsky describes rewriting software from scratch as the worst thing you can ever do and I agree that in most scenarios, this indeed is the case. However, sometimes there are exceptions: here at Lux Group we have an opportunity to work on rebuilding a subset of features that simplifies and changes our business requirements and we believe we satisfy the criteria outlined above to enable a successful build that hugely improves our capacity to deliver value to the business.