Project Preliminary
What should be thought about before thinking about the project

Let's start with some theory, shall we?

Project Planning

All well thought-out projects need some sort of planning. Now, I know, most hackers out there tend to start with hacking, diving right into the first upcoming hack-attack and not giving up until the application is somewhat usable. I often am no different. Unfortunately, this usually leaves us with working, but sub-par solutions that need major overhauling if not complete re-writes whenever slight adjustments need to be made or some serious issue is detected. We may demonstrate our skills and how fast we make GOOD USE™ of them but, if we do not dare to clean up after ourselves and focus on building solid foundations, we will just leave a giant, unmaintainable mess.

The Ugly and the Beast

Major parts (if not the absolute majority) of the software industry have worked along the paradigm of extreme programming for the past decades. It brings clear advantages: creating proof-of-concepts quickly, delivering minimal viable products (MVP) as fast as possible so the client knows we are doing what they pay us to do is great - this builds trust! Unfortunately clients tend to have a hard time seeing reasons to clean up the mess that was generated in the hurry of fast initial work. Why clean up something what obviously works? It is only months, maybe years later, after features upon features have been built onto that initial foundation that code may start to feel slow. Or finding and fixing bugs becomes so tedious that management considers a major overhaul of the project.

Then, naturally, questions of cost arise. What might have been some extra hours every now and then during early development and maintenance of the project has then become a complete-rewrite-from-scratch, a huge mission that usually lacks hours and motivation to complete. The software that was written in so many hours has come to a state where keeping it means living with a relatively big pile of unknown risk and neither fixing it nor re-writing it from scratch seems viable. Software then is too often dismissed and replaced for some other solution - and all that precious and costly work is lost.

On the other side of the development spectrum there is the V-model. Named after the cascade of requirement definitions (from high to low-level) and - on the opposite side - the acceptance tests for each requirement, the V-model is commonly used in industrial development, where solutions are developed for the long-run, have safety requirements that can not rely on do-fast-and-never-clean-up methodology of "agile" workflows. In practice this means long, tedious meetings where bleak papers are discussed - for many a sheer horror only thinking about it.

As complicated as it may sound, the V-Model is a proven workflow to guarantee to create solutions for your requirements. Unfortunately, this guarantee comes with its price. So - being an individual developer targeting small to medium-sized ventures - I'd suggest something in between. Thinking about requirements, writing them down and (early on) defining acceptance tests is great, remaining flexible with actual solutions too.

Risk

So, what is risk, exactly, and why should we care? Let's imagine that we ought to build solutions that are supposed to last - at least for the foreseeable future. The risk I am talking about comes in one of many forms. With regard to software, we usually talk about bugs: errors that were built either deliberately (without knowing about the implications of some code) into software, that only get discovered after deployment or that only emerge after the very same software that successfully ran is used in an altered context (new hardware, different network environment or stack, etc).

So, how should we think about risk? What should be considered?

Available Expertise

Time to deliver bug fixes should be (at least somewhat) projectable: we want to be able to assure that delivering some bug-fix will cost something between two minutes and (worst case) a day or a week of an expert worker's time. We also want to be sure to having a team that is capable of finding the bugs in the places they emerge, and not just figuratively duct-tape over some code that happens to show a symptom of the bug. Otherwise we would just willingly accumulate risk until our product finally collapses under its own weight, and - depending on the importance of said product - we lose that business.

Supply Chain

We want to be safe against issues emerging from the outside: If a piece of software we have built upon happens to show some security flaws, we want to either be able to replace it or be able to patch it, so it won't affect the quality of our product. This does not just hold for software products, it is equally true for hardware or work-force: unforeseeable events like a company going bankrupt, key people quitting (although, people usually go with some sort of grace period) or becoming otherwise unavailable (due to illness or death), natural disasters like fires, earthquakes, high concentrations of cosmic rays that delete all the disks in our server racks… Asserting risk is about sketching scenarios of failure and preparing mitigation.

It is crucial to ask the inconvenient questions. How long will it take to train existing staff or hire new experts to fill the gap? Where will we get our data back from, if for example our main site vanishes after an earthquake? Will we be able to work on our data sets once our (proprietary) database provider goes bankrupt?

Reproducibility

Another risk factor is reproducibility. Today usually hidden well beneath other, more superficial issues, working on non-reproducible environments is - due to its injection of unknown risk - simply bad practice. One worker quits and we task the next one available with a simple task, but instead of fixing the bug or implementing the feature the new worker first has to re-build a functioning environment through an unnecessarily difficult game of trial-and-error (which is how we more often than not do it, don't we?). And - even worse - as soon as said worker has implemented the feature, maybe the same game has to be repeated for the CI/CD pipeline, or - imagine - the production environment. I know, most development is eased through the ubiquitous use of docker environments. These are - of course - very cool and leave us with more time to actually develop, while they just work™.

Every hacker who has tried to reproduce a bug that happens in one of the usually three environments (local development, CI/CD, production) but not in the others knows why this is crucial.

What about problems within a docker container? Do we feel capable to reverse-engineer those containers, can we re-create (bit-by-bit) them easily, pin down problems in them and once-and-for-all eliminate them? Are we confident that some outage on let's say dockerhub (or a similar service) wouldn't invoke problems in our pipelines? Do we mirror all necessary parts on premise?

Bottom line

The cruel thing with risk is: mitigation costs, while risk may stay invisible for a long time. But when it hits, it usually hits hard. So what is the right thing to do? Keep risk manageable.

  1. Keep risk in mind,
  2. build solid foundations for the future,
  3. regularly evaluate risk for our projects and
  4. do not wait too long to fix outstanding issues.

This should boil down to imposing solid practices on staff: review changes, code and documentation - preferably from workers of opposite sides of the spectra (junior/senior, DevOps/coder, etc.) and keep track of risk potential, ideally by appointing dedicated staff and budgeting adequate amounts of time for this task.

Last modified: 2025-04-09 Wed 19:07