Root Cause Analysis for Software Problems

By |2008-03-31T05:43:14+00:00March 31st, 2008|Enterprise Architecture|

Let’s assume that every problem worth solving has a cause.  Interesting assumption.  Reality: Any one problem may have many causes.  The causes may interact in complex ways.  How do we go about figuring this out?

We can use the technique of Root Cause Analysis (RCA) to brainstorm the reasons for why the problem is occurring. I’d like to demonstrate this technique to investigate the causes for JaBOWS (in a later post).  First, we need to lay some groundwork.

Many of you have heard of Root Cause Analysis.  The most common method is known as “5 whys” where you ask the question “why” five times. 

Problem: My car won’t start
Why: The battery is dead
Why: The alternator doesn’t work
Why: The alternator belt broke
Why: I allowed it to get frayed and worn
Why: I have not maintained the car on the proper schedule

That method is useful in some situations, but it leads to a single cause.  For systems of people and software, I’d rather use a fishbone diagram, aka Ishikawa diagram. 

The fishbone diagram is usually used to categorize the causes of systemic effects in product manufacturing.  In that space, we would use categories like: Price, Promotion, People, Processes, Place / Plant, Policies, Procedures & Product. 

Is this useful for system integration or IT Business Software?  Not really.  The listed categories miss the obvious problems solved by information or functional interdependencies.  In addition, the notion of ‘place’ is probably best replaced with ‘proximity.’


Doing a little analysis, I repurposed the standard categories to make more sense for Software Root Cause Analysis.  See the Software Fishbone Diagram above and the description of the categories below.

To demonstrate the value of this work, I’ll follow up in a later post by using this approach to take on the causes for JaBOWS

Category Definition of the Software Root Cause Category
Cost Causes driven by or related to the cost of resources, time, materials, or licenses needed to create, manage, deploy or maintain systems.
Culture Causes driven by the culture of either the producer organization, customer organization, or prevalent culture
Context Causes driven by the interrelationships between information, process, and/or functionality supported by the software or embedded within it.
People Causes driven by the people involved in creating, managing, deploying, or maintenance of systems.
Process Causes driven by the business processes by which the system is created, managed, deployed, or operated.
Policy Causes driven by the policies of the organizations that create, use, or maintain software.
Platform Causes driven by the capabilities of the software systems used to create, manage, deploy, and maintain the software.
Proximity Causes driven by the relative distance between people involved in the creation, management, deployment, and maintenance of the software.

Put perspective on both short and long term problems

By |2008-03-28T16:47:00+00:00March 28th, 2008|Enterprise Architecture|

I’m always a bit worried when someone has “the answer.”  Lot’s of red flags go up when someone tells me: this is the problem and this is how you solve it.  Perhaps I’m just that kind of person.

I had a recent exchange with Alex Maclinovsky over at Sun.  Very practical guy.  Love his stuff.  We were discussing the idea of whether a common information model was a valid thing to kill off JaBOWS.  He agreed that is would be a really good thing, but that it is impossible to create.  I think his point is that we should have a model to refactor services as time goes on, and thus create a consistent set of services in a bottom up manner.  (Alex: I hope I understood you correctly).

So, when Alex suggested that it is a good idea to use a common information model, but that it just ain’t gonna happen, I guess I reacted in my gut. 

That’s a very short term perspective.

The problem at hand, most days, is building services for your app to expose, that others can use.  That is the short term view, and it is VALUABLE.  That is where the rubber meets the road.  In that respect, I agree with Alex.  We need to have a way to create versions around services, and to migrate them towards a cleaner model as time goes on.  To that end, Microsoft Consulting Services has created an open source project to ease the transition of service versioning called the Managed Services Engine.  I recommend all readers working in SOA on the MS platform to check it out.

On the other hand, if we take a short term view, without considering the long term, we can still end up with JaBOWS.  And that is the battle I have staked out.  It is a longer term battle.

So when we define the problem, it is fine to recognize the value of “extensible services” but it is dangerous to ignore the need for a destination… a place where we want to extend them to.

The interesting thing, really, is that Microsoft has a number of internal initiatives around creating a common information model.  There are at least four that I know of.  All of them are fairly complete.  Of course, they are different.  So the problem is not that we cannot create a model that covers all aspects of the business.  The problem is that we can’t get it to be recognized as a common model that covers all aspects of the business.  The difficult word is ‘Common’. 

We are working on it.

That said, the areas where we agree are immensely valuable, and even the areas where we disagree offer value because, in any one area, there are two “right” answers which means that, if you want to get things done, it is easier to pick one of them rather than create a third.  So we get governance via politics.  Not efficient, but definitely effective.

So when we attack JaBOWS, let’s have short term solutions.  Let’s include service extensibility.  That’s great.

But let’s also include longer term solutions.  Let’s let the “impossible” remain as a goal, because you will find that by keeping that impossible goal in mind, we end up getting there. 

So many of our dreams at first seem impossible, then they seem improbable, and then, when we summon the will, they soon become inevitable.”  (Christopher Reeve)

The Problem with Process

By |2008-03-25T18:32:45+00:00March 25th, 2008|Enterprise Architecture|

I love business process.  I hate “business process.”

I love the goals and concepts and important value that comes from drilling in and understanding the planning, management, and operational measurement of business processes.  As I’ve said before, business process forms one of the three legs of the Enterprise Architecture “Double Iron Triangle”.  (Click the image for a description of the double iron triangle).image

What I hate is the term “Business Process.”

The term “business process” is recursive.  A process can be made up of other processes.  As a result, a business process can be anything from a broad organizational flow (like “Order to Cash”) to a very specific instance of a line-of-business flow (like “Qualify lead gathered from general business conference for license agreement sale”).

For Enterprise Architecture, this is a problem.  I guess that some other folks may care too. 

In many ways, this parallels the discussion of “what level of granularity do you use when defining your services” that we see pop up in the SOA community.  The level of granularity for a service goes to the understanding, planning, design, governance, and ultimately the ability to compose that service in novel ways.

And the same thing, ironically, happens when we don’t have a clear understanding of the level of granularity at which to describe a process.  If the goal is to create a “framework of process composability” that allows a process to be composed of ‘common’ subprocesses that have been optimized for key business objectives, then we need to know what to call these ‘building blocks.’ 

Clearly, the term ‘business process’ is too vague for that purpose.

JaBoWS is the Enemy of Enterprise SOA

By |2008-03-17T17:33:36+00:00March 17th, 2008|Enterprise Architecture|

As a community, we have sat silently by as the pundits have sold products that fail to deliver on the promise of SOA.  We have watched, many of us in horror, as the goal of changing behavior, and changing infrastructure, has fallen victim to “yet another tool” to solve the same problem.

Don’t get me wrong.  I don’t hate tools.  For one thing, there are some tools that support Enterprise SOA*.  Not many, but a few.  Those tools understand that Enterprise SOA is not about building one service after another, but building the right services, and building them in a manageable and non-overlapping way. 

What I hate is the notion that SOA can be reduced to tools; that you can introduce a tool and suddenly all the bad (human) behavior stops.  I want to dispel that notion right now.


  • If you take a group of well-meaning and intelligent engineers,
  • and you give them a process that looks like a normal software development process**, and you train them on it, and they believe that this process works…
  • and you add SOA…
  • you get JaBOWS (Just a Bunch of Web Services).

I did not invent the term “JaBOWS.”  Near as I can tell, Joe McKendrick did, a couple of years ago.  However, I am taking it one step further.  I believe that JaBOWS has specific causes and can be specifically avoided.  Not just with an executive sponsor, as Joe suggested back in 2005, but with a comprehensive Enterprise SOA transformational program, an approach designed to create a reusable infrastructure. 

Failing that, companies that invest in SOA are getting tripe, and the entire goal of achieving Enterprise SOA takes a hit.  We all lose when any one company kills their SOA initiative for lack of value.  In the SOA community, we are all invested in the success of each company that has bought the hype.  If we sit quiet, well before those initiatives fail, then we have no right to come back, two years from now, and say “well, it failed because you didn’t do it right.”  Or worse, “if you do it again, we can show you how to get value.”  That Ain’t Gonna Happen.

As a community, we have to do a better job of defining what it means to build an Enterprise SOA*. 

In Microsoft IT, we are using something we call “Solution Domain Architecture” or “SDA” to build an approach to services that, we hope, will result in the creation of an Enterprise SOA.  SOA is the benefit, SDA is the way to get there.  And the reason to use SDA: to avoid JaBOWS.

In order to force that growth, and leave the bad behavior behind, we have to declare war on JaBOWS. 

JaBOWS is the dead end that kills value.  It is all that is wrong with top-down approaches that produce unusable services, or bottom-up approaches that produce overlapping and badly coordinated piles of services.  JaBOWS is the costly, time-consuming, valueless exercise that so many companies have taken upon themselves in the name of SOA. 

Join me now.  Join me in decrying the creation of piles of useless and valueless noise.  It doesn’t matter if it can be discovered, or governed, or built quickly, if it is not reusable.  It doesn’t matter if it is built on WS* or leverages the best security protocol on the planet, if it is not decoupled correctly. 

Join me by writing about JaBOWS, and talking about JaBOWS, and sharing experiences about how to effectively avoid JaBOWS.  Join me by sharing what is wrong with building too many things, none of which are actually usable outside their original context.  Join me, by discussing the processes by which developers build the right systems, not just the tools that we need to buy and the interface standard we need to adapt.  None of those solve the problem.

It’s not a tools problem.  It is a process and people problem.

Tools + existing processes = JaBOWS.   And that is baaaaaad.


* Enterprise SOA goes way beyond “making two apps talk using a web service interface.”  It is a systematic approach to developing an Enterprise-wide Service Oriented Architecture that actually allows information, process, and functionality to be composed in new ways, ones that were not foreseen by the authors of the services.  Until you have this, Web Services are just “interoperable COM.”    Without Enterprise SOA, you have JaBOWS.


** I am including agile development here.  There is nothing in Agile methods that makes the problem worse, but there is nothing that makes the problem better, either.  If you say there is, tell me what agile book, on what page, aligns the agile manifesto with Enterprise SOA development.  I have all the books right here. 

Right-sizing the Event Message

By |2008-03-06T21:44:47+00:00March 6th, 2008|Enterprise Architecture|

One challenge in designing Event Driven SOA exchanges is to decide how big your event messages should be.  I have an approach that I believe works, and can leverage information that probably already exists in your organization… in the data warehouse.

First a quick introduction to Event Driven SOA, for those folks who aren’t familiar with the concept. (Note: some folks use the term SOA+EDA to refer to this idea).

Event driven SOA is a subtype of Service Oriented Architecture that I believe is one of the few ways to build an good distributed system using SOA principles.  You start with the core principles of SOA and you add an additional constraint: messages will be passed using async publish-subscribe mechanisms to the greatest extent possible. 

A great overview of the distinction between SOA and Event Driven SOA is provided by Jason Bloomberg in this Zapthink article.  An excellent source of information on EDA as an extension to simple SOA can be found by reading the blog posts of Jack van Hoof, a talented SOA architect and proficient blogger. 

One reality with Event Driven SOA is that you need to publish events to an infrastructure, like an ESB.  Those events can be sent to any number of subscribers, who will act on them.

One principle that I espouse is that your event messages need to have “sufficient information in order for the subscriber to decide if it should consume or discard the event without requesting further information from another system.” 

Another principle, and one that may appear to be in conflict with the above principle, is that event messages should be a small as possible, but no smaller.

This is important, because the subscribers may not be interested in all events of a particular type, but rather all events of a particular subtype. If the event message is too small, there may not be sufficient information for a subscriber to decide if they should act on the event, or throw it away. 

It is less expensive to put a reasonable amount of information in the event message, so that the subscriber doesn’t have to call back to the event publisher before deciding if it should discard the event or use it.  But how much information is “a reasonable amount?”  I can answer that.


Let’s say that you have a customer relationship management (CRM) system.  You want an entry to be added to the CRM system when a customer buys a product, regardless of the how the customer got the product (from a partner, or a retailer, or an online promotion, etc).  This way, when the customer calls with a problem, you know what products they own, when they purchased them, etc. 

But let’s say that your company has locations in Europe and the USA, so you have implemented two CRM systems.  The CRM system in Europe handles information about customers in the Eastern hemisphere, while the CRM system in USA handles information about customers in the western hemisphere.

A new sale is made.  An event message goes out: “new sale is made.”  The CRM system in Europe gets the message.  Does it act on the event?  That depends… is the customer in Europe? 

How much information is enough… or too much?

SOA only works if the event subscriber is independent of the publisher.  The two must be loosely coupled. 

However, with the scenario above, it would be really tempting to say that we should put some kind of ‘hemisphere code’ into the ‘new sale’ event.  That way, the subscribers (CRM systems) could decide if they should discard the event or act on it.  Adding this field, on the other hand, is coupling.  We have allowed the information requirements of a subscriber to influence the messages sent by the publisher. 

So, how many times can we do this?  What if we add 10 new subscribers, and each needs some variation (one wants a customer id, a second wants to know the marketing program tied to the agreement that is tied to the sale, another wants to know what family of products were included in the sale)?  If you meet all of their needs, the event message will be huge.

There has to be a fine line that allows you to send a message that is “big enough for most folks” but still requires some systems to occasionally ask for further data before proceeding.

The answer lies in the data warehouse schema

Ever seen a star schema?  It is a way of structuring information to allow a tremendous amount of flexibility in reporting, and is used extensively in business intelligence and data warehousing.  In the center of a star schema is a ‘fact table’ that contains a row for every ‘fact.’  This row usually has a numeric value in it along with identifiers from a set of ‘dimension tables’ that surround it.  Each dimension may have another dimension behind it, creating a ‘star’ pattern. 

For our scenario, I created the following star schema.  Note: this is off the top of my head, and is not indicative of any actual system, in Microsoft or anywhere else.


The fact table is labeled ‘Sale’ and sits at the center of the star.  The numeric ‘fact’ is probably the dollar amount allocated to the sale of a particular item on an invoice.  (commonly, in the warehouse, each row of an invoice is represented independently, to allow information to be easily aggregated by product).

Of course, deciding what dimensions to use in your fact table requires a great deal of analysis.  The effort to create the dimensions requires some understanding of “what things are important to the business.”  These star schemas change fairly rarely, because once they are in place, and a million facts are loaded up, it can take a huge amount of effort to add a new dimension.

And this stability is useful.  We can use it to create the decoupling that we need.

The right sized event message

If you look at the Sale schema in the data warehouse, as illustrated above, there are a series of tables that form the first tier in the dimensions.  I have colored them orange, and they include things like product, date, fulfillment type, agreement, and customer.  Clearly, these are things that are important to the business.

I maintain that the ‘right sized event message’ will contain information from most, if not all, of the dimensions that the event would require in a data warehouse.  Therefore, for this particular event: “new sale,” a good XML structure may be:

<Message ID=”69BB8E06-5168-4D46-AC1A-1F0563374805″>
    <Event type=”NewSale”

           <Customer ID=”4554″/>
           <Agreement ID=”CCG9934″/>
           <Date ID=”14/02/2007″/>
           <Fulfillment Type=”03″/>
           <Value Total=”24454.04″
Tax=”310.54″ />

Some notes here:

  1. I formatted the message above using a simplified, readable XML layout.  I am not implying that event messages must to be in any particular format.  That is up to you.  I’m just trying to be readable on my Blog.
  2. Careful readers will notice that one of the dimension tables is ‘product’ but I don’t have a list of products in the event message.  That is because there can be quite a few products in a ‘sale.’  I try to avoid putting an open-ended list into an event message. 
    Clearly, there are tradeoffs here.  (Aren’t there always?)
  3. Clearly, the star schema contains branches.  I didn’t reflect every branch beyond the first tier dimensions.  This is an effort to keep the message small. 
  4. The Sale has a sale id of ‘sale/234434’ which is not a full URI.  I’m not making a statement for, or against, using a URI for an enterprise identifier.  That is up to you.  Similarly, I used a GUID to identify the event message itself.  Whatever id you use to identify the message itself, that is up to you.
  5. You have undoubtedly noticed that the event message contains identifiers for the elements to be linked to, not full details.  Therefore, the subscribing system will need to be able to access the values for customers and/or agreements from the messaging infrastructure if that information is needed.  There is no attempt to be comprehensive.  Simply sufficient.
  6. You will notice that I put the facts about the sale value into the event message.  This is entirely optional.  However, it is reasonable to predict that a message consumer may want to differentiate the message based on the facts that sit at the center of the star schema.

Remember our scenario?  We said that we implemented a CRM system in the USA and another one in Europe.  The event message above comes in to the CRM system, but we don’t have an indicator if the customer is in USA or Europe.  So did we meet the criteria?  Is this event message large enough, without being too large?  Is it decoupled enough.

I believe that the answer to both questions is “Yes.”  This is because we included the customer id. Clearly, a CRM system would know what customers it has.  The CRM system would do a quick lookup against its own database to find the customer before deciding to process the message.

This allows the subscriber (CRM) to decide quickly if it needs to process the message or not, without going back to the publisher to ask useful information like “what is the id of the customer.”  It also allows the publisher to be independent of the subscriber, while still providing the fields that the business considers important for understanding the transaction itself.


One practice to consider: when deciding what fields should be in your event messages,  look at the information captured in the tables of your data warehouse.  This applies especially to relevant fact tables and first level dimension tables. 

If you can find a star schema in your BI database that matches the entity you are trying to model, consider yourself lucky. In that case, use the dimensions of the fact table to decide what values the company cares about and model your events to contain most if not all of your fields. 

Selling to Executives – Investigate their metaphors

By |2016-12-08T20:41:14+00:00March 3rd, 2008|Enterprise Architecture|

couple-thinkingThis tidbit came to me indirectly.   I was having a meeting with a talented architect this afternoon, and after the meeting wrapped up, we were chatting about some of the different tactics we’ve seen for “selling an idea to an executive.”  At the end of the day, the ability to influence an executive is a core competency for Enterprise Architects.

So this architect, whom I will name “Bob” to protect his identity, points out the problem of the “sidetracked metaphor.”  It happens like this:

  • You are working hard to take a really complex idea and turn it into something that a business executive can get his or her head around, without too much difficult.  Something that you can explain, and that rings true.
  • You come up with a metaphor that you find compelling.  Perhaps you run it past some of your friends, and they find it compelling too.  So you put the metaphor into a powerpoint deck and work all your content around it.
  • You give the presentation.
  • The executive stops you:  He doesn’t understand the metaphor, or worse, he doesn’t agree with it.  He flips the bozo bit on you.  Your idea is dead.  Doesn’t matter if the idea will save him millions.  Actually, it does matter… because the next person to present the idea, using a metaphor that he likes, will get his ear.  THAT person will convince him.  You won’t.


So how do you prevent this?  Bob gave me a good idea…

Investigate the things that the executive has said, or written about.  Investigate other programs or ideas that he has signed up for.  Investigate business books that he has mentioned.  Investigate products that he loves to talk about, and even activities he may be involved with.  You are looking for the metaphors.

Create a list of metaphors that your target executive understands.  What metaphors have driven him to action?  What metaphors does he use in his own presentations?

Leverage that knowledge.  Pick a metaphor that is related, or ties neatly, with one that he or she is already familiar with.  Reuse a metaphor if it won’t create confusion.  Put that metaphor into your deck.

Your odds of getting your idea into the head of your executive just doubled.