I just finished reading Team Topologies by Matthew Skelton and Manuel Pais. While many of its recommendations are geared towards organizations with a large software development organization, I found a lot of great insight for anyone who is part of developing software with more than one or two other people. Team Topologies relentlessly applies Conway’s Law and the “Reverse Conway Maneuver” to building an organization. Melvin Conway’s Law simply states that the systems built by an organization will reflect the way that organization communicates. The “Reverse Conway Maneuver” is an application of Conway’s Law: if you want to build a system with a certain architecture, then you need to build the organization to fit that architecture.
Let’s start to understand this by looking at a fictional developer. I’ll call her Sonia. Sonia works for the fictional company Widgets R Us. Sonia is a senior developer with deep knowledge of how her company operates and her users’ needs. Sonia is one of the developers who primarily work on the business logic of their website. Among her responsibilities are:
- Order Pricing
- Sales Tax
- Shipping Calculations
- Order Fulfillment
- User Registration
Sonia knows what good, well designed code looks like. She can explain SOLID in detail and both defend it and critique it. All of her code has good test coverage and others can easily understand it. She mentors other developers and has a great all-around reputation in the company. She even maintains a blog and speaks at developer conferences. Sonia has one problem though: she’s human. She can’t be perfect and, like most of us, she tends to solve the immediate problem first, then think about future problems that may need to be solved.
If you look at Sonia’s responsibilities, a lot of them have to do with pricing an order. She wrote the core of that code years ago. It was well designed, but it was still written by one person. To calculate an order price, she needed to do a lot of things:
for each item in order line items determine line item discounts calculate line item subtotal calculate order subtotal calculate order discounts calculate shipping calculate sales tax
When she wrote the code, the discount logic alone was pretty complex. There are lots of ways to get a discount:
- discounts could be applied to single line items (e.g., buy 1 get 1 free)
- discounts could be applied to the whole order (spend over $100 and get $10 off)
- discounts could be applied to shipping (spend $50 and get free standard shipping)
- discounts could be applied based on the amount the customer has spent in the last month
- discounts could be applied based on a code the customer entered, not money spent
- additional promotions can be applied like the customer gets a free gizmo if they buy 10 widgets
All of this is contained in a fairly well built but complex set of code and data structures. She spent a lot of time with the right subject matter experts when she designed it and managed to build a system that could handle every type of promotion and discount the company could come up with and they came up with a a lot of different scenarios. Lots of tests meant no real bugs were ever found in the code itself and made sure the code didn’t break when they had to extend it. Sure, there was that time one of the business admins configured a discount the wrong way, but that wasn’t a problem with Sonia’s code, that was mis-entered data.
The thing is, it was all done by Sonia. She handled every situation based on an assumption that discounts were always an internal company decision. She also built it thinking of pricing as a single service. She didn’t need to collaborate much with other developers when writing the code itself although she did get their input on her design for the service. She also met frequently with the product and marketing teams to understanding the requirements. Even though she built in plenty of extensibility to adopt other features, that was limited by what she imagined. Sure, she built in some microservices and service oriented architecture and message oriented middleware, but she never really thought that one day discounts and promotions might come from outside the system. She applied those concepts to things like shipping and sales tax calculations knowing that there were third parties that would handle those. She also expected there would be some future complexity with calculating product prices because the company was talking about customizing products for customers. However, she always thought about discounts and promotions as an internal part of this system she was building. Then, when the company makes a deal with a third-party to manage discounts (maybe something like Groupon or maybe some sort of loyalty program that spans brands), she isn’t ready for it. That other system wants to calculate the discount in one pass, so her logic for calculating discounts doesn’t line up.
So what we end up with is:
- A checkout service that communicates with:
- A pricing service that:
- Includes an internal subtotal calculation
- Includes an internal discount calculation
- Communicates with a shipping pricing service
- Communicates with a sales tax service
- Includes order total calculation
That reflects an organization structure:
- Developers working on the checkout service
- Sonia working a pricing service including:
- Sonia working on a subtotal calculation
- Sonia working on a discount calculation
- An outside firm offering a shipping pricing service
- An outside firm offering a sales tax service
This is obviously over-simplified, but hopefully you can see the point: the services were built in a way that reflected the structure and communication tendencies of the organization. If Widgets R Us wanted to be sure that all the bits of a pricing service were built as separate services, they could have organized it this way:
- Seth works on checkout
- Sam works on subtotal calculation
- Sonia works on discount calculation
- Seth works on integrations with the shipping pricing service
- Sonia works on integrations with the sales tax service
- Sam works on order total calculation
Since Sonia won’t be up to her eyeballs in all of the pricing code, she will naturally tend to build her discount and sales tax code in such a way that they stand independently of the other bits. This is the essence of the Reverse Conway Maneuver: design the organization to reflect the desired system design. Again, this is over-simplified. It also is not a terribly good example. We are, after all, picking on Sonia for not having an insight, so we shouldn’t assume anyone else would have that insight!
Why don’t we always create this division of responsibility? Again, it takes some insight into desired end state, that can be as difficult as predicting the winner of next year’s sportsball championship. It can also heavily disrupt development flow and make debugging much more difficult as the code that implements a single workflow no longer is one person’s head. We can somewhat mitigate the latter by making sure we’ve got good infrastructure to support debugging (e.g., logging & alerting) and make sure we have a clearly defined bounded context (this example isn’t great because there is a clear context if you only look at it from the outside). Good coding practices can also mitigate the inability to predict the future by making sure we’re not just building to get the feature released. Good coding practice also helps us build for maintainability and readability. Keeping in mind that our first job as a developer is to make sure the next person’s job is easier.
You may have guessed that Sonia’s development organization divides the work up between a team of developers who work on the application services (business logic and data storage) and another team of developers who work on the user interface. This division of teams by technical specialty(UI vs application services) rather than business domain causes its own issues. These issues are worse than needing to refactor a bit of discount code now and then to meet changing business needs. If the UI team and app services team have separate backlogs of work, they have different priorities and that will inevitably cause one team to do something that doesn’t live up to the desired system design. For example, if the backlogs have the UI team changing the order summary screen and there’s a new subtotal that should be shown, they may end up calculating it in the UI code if the app services team has not made it available yet. This could lead to confused or angry users when the logic changes yet again and they see one total yet get charged for a different total. Just resolving routine bugs can be come more difficult with this sort of organizational structure.
Let’s say this issue is reported to the UI team: the amount displayed for the price on a product catalog page is not correct. The UI developer sees the issue, agrees it’s an issue, but may not understand the root cause because they don’t understand the math behind it. They send it off to the app services team who looks at the data coming in and the data going back. The app service team think it’s okay. They then notice it’s a currency issue. The UI is asking for EUR but showing a dollar sign. The app services team sends it back to the UI team to fix. Because they’re separate teams and not members of the same team, the communication tends to be more formal. One team sending it to the another tends to spend more time “building their case” for why it’s the other team’s issue.Because neither really understands the other’s work, the issue can end up getting sent back and forth over the wall several times before a cross team meeting is finally called. It may sound trivial, but this happens all the time. It’s even worse when those teams are in different time zones. Now a two-hour bug fix ends up taking a week as messages between the two teams are delayed a day going back and forth (but we’re saving money by having people in a lower cost geography, right?).
This is why teams should be split by business domain, not technical specialty. Keep the number of reasons teams need to interact with other teams to a minimum. Make sure the integration points are well defined. Don’t forget to add plenty of automated tests around those interfaces! This is also why the members of a team should be working in the same or adjacent time zones. The internal communications of a team need to be quick and easy, not delayed by time zones or formality.
Another organizational anti-pattern I see too often: splitting the developers into a new development team and a bug-fix team. If it’s just for a few weeks after a big launch, sure, go for it. As a permanent thing? It is terrible. Any time I’ve run into an organization with that structure, the people who came up with the approach have either never have spent time developing code professionally or it was so long ago that they can’t tell a semi-colon from a curly brace. They have lost sight of the fact that people who created the code are the people who will fix it the quickest. Those same developers are also the people who need to know there was a problem so that they don’t make the same mistake again! The usual rationale given for this type of structure is to let the feature development team get back to work when there are a lot of bugs. IF there are truly a lot of bugs, shouldn’t the organization being going all-hands-on-deck to fix the bugs before adding more features on top of a shaky foundation? You bet they should. Get quality under control and you won’t regret it. Don’t defer it; that’s the expensive solution in the long run. It’s usually easier to apologize for a delay than to apologize for bad quality. Two years in the future, almost no one will remember the delay, but they will remember bad quality for a long time.
To conclude these thoughts on Conway: all of the latest design and architecture patterns, cloud technologies, automated tests, design reviews, user acceptance testing, code quality tools, and funky mnemonic acronyms will not be as effective as structuring your organization to get the results you want. That structure needs to account for how people communicate with other people, how teams communicate with other teams, and general human behavior. Organize by functional domain rather than technical specialty and you’ll generally get good results. There are times to do it other ways, but that’s generally to form a team supporting the domain teams or with truly hyper-specialized knowledge. Team Topologies defines these teams as: Enabling teams, Platform teams and Complicated Sub-System teams. While these three types of teams have their various reasons to exist, they are all there to support the teams that are aligned to the domain (the book calls these Stream Aligned teams), not to replace them. Plan the collaboration between these teams carefully and you will be well on your way to successfully applying Conway’s Law.
Team Topologies isn’t a long book. I recommend giving it a read if you’re part of designing and running software teams, or even just a part of one.
And by the way, while greatly fictionalized for the sake of creating an example, “Sonia” is pretty heavily derived from a real world experience I’ve had. If you find yourself saying “that sounds hokey,” then that’s probably one of the true-to-life parts of the story!