Software Engineering  

Caring for Complex Systems: We Can Do This

By Jessica Kerr  |   Last modified on February 27, 2023

When we work at it, professionals are pretty good at analysis. We can break down a simple system, look at its parts and their relations, and master it. Given enough time and teammates, we can analyze a very complicated system and fix it when it breaks.

But complex systems don’t yield to analysis. We have to add another skill: sense-making

Complex systems have parts that learn and change, with relations that vary with state and history. They respond to and influence their environment. They’re always changing, so even if you omnisciently knew how it worked at one moment, you’d be wrong the next.

Complex systems include families, teams, and distributed software systems under development.

Fortunately, we all have the skill of sense-making. It’s what we do as humans, as people. We form ideas of how our communities work, how each other works, how our social and political systems work. We form ideas of what all that means for us. And we don’t do this by knowing everything about them, but by noticing things and discussing it with each other. We do sense-making naturally in order to exist as part of the world.

Analysis, by contrast, is a skill we learned on purpose. Analysis is rigor. It gives us mastery. It promises control. It’s appropriate for parts of our system: I can totally analyze a program. Then I can change that program to do what I want. This feels good.

But once I hook my program up to software written by a bunch of other people, or put it in front of users who do who-knows-what with browser plugins, most bets are off. Now we’re in the realm of complex systems.

We need sense-making. Sense-making involves coming up with a theory, asking questions to investigate it, and getting something better than answers. The sense-making circle gives you better questions. Narrow in on what matters to you and get a good-enough understanding of that.

A good-enough understanding lets us influence our complex system to our liking. We can move it in the direction we hope for. There is no control.

Some scientists do sense-making very explicitly. I like Brene Brown’s research into what deeply happy people have in common. This is a qualitative, evidence-based method for theory formation and investigation.

What does it mean to do sense-making in teams? 

Maybe as a manager, I notice that people’s voices in standup sound duller than they used to. Are they getting tired? I ask them. José says yes, his 2 year old is sick. Marina says not really; she was just more excited by the last feature than this one they’re implementing. Xiaoping agrees: adding access controls isn’t fun. It makes everything harder for the user, and it feels so arbitrary.

Hmm, maybe the team doesn’t feel good about what they’re implementing. New question: what feels arbitrary? Everyone chimes in with an example. For one: it takes one access level to add an item, but a higher one to change it—what if a user makes a typo?

Now I have detailed questions for the product and design people. They respond, “Wow, good point, let’s meet with the whole team about these.” Soon we have a better product and a team that’s invested in the work.

Caring for complex systems: sense-making is necessary.

What does it mean to do sense-making in software? Every incident response offers an example.

Readable code & observability

Early in my career, I learned to write readable, testable code. Readable code is amenable to analysis. You can look at it carefully and predict what it will do, then verify that understanding with a unit test. That understanding keeps our programs malleable.

These days, I am careful to write observable code, leading to observable software. Observable software is amenable to sense-making. It explains itself in traces, and lets us see what happens  at scale in production. Add arbitrary queries, and we get the sense-making circle of forming new questions.

That needs a concrete example. Here’s one: I expect my services to respond to a /cart request reliably and in a reasonable time. I can ask whether that happens with a heatmap, and find out: sometimes. Then I can ask, “Why is it slow sometimes?" I can click on a slow dot to get a trace. The trace tells me which part takes the most time: a span called GET /price. Now I ask, “Is that normal?” and aggregate all the GET /price spans, and I find a jump in latency a few hours ago. Now I know where to go: the GET /price code in the pricing service, and its recent release history. Once I’m in the code, my analysis skills come out.

Sense-making in software: using heatmaps to find outliers.

We need both: analysis and sense-making

Analysis skills are essential for software development. We work on those, and that’s good, because they don’t occur naturally in humans. 

Working in production software, we also need sense-making skills. Fortunately, people have those. We can work at exercising them more explicitly.

And then there’s software! We’ve made modern programming languages and APIs more amenable to analysis: more declarative, more readable, more testable. Software can’t do sense-making. But it can help us with ours, when we make it more observable.


Related Posts

Software Engineering   Culture  

Establishing and Enabling a Center of Production Excellence

Software is in a crisis. This is nothing new. Complex distributed systems are perpetually in a state far from equilibrium, operating in what Richard Cook...

Software Engineering  

Simulation Theory, Observability, and Modern Software Practices

The 1981 book Simulacra and Simulation by Jean Baudrillard is widely read and cited within academic circles but also permeates popular culture, influencing films, literature,...

Software Engineering   Monitoring  

What Is Application Performance Monitoring?

Application performance monitoring, also known as APM, represents the difference between code and running software. You need the measurements in order to manage performance....