Having a High Bus Factor

By |2005-06-28T13:51:00+00:00June 28th, 2005|Enterprise Architecture|

A friend of mine pointed out an interesting post by Scott Hanselman that used a clever phrase: “having a High Bus Factor” which is to say: if the original developer of a bit of code is ever hit by a bus, you are toast.

The example that Scott gave was a particular regular expression that I just have to share.  To understand the context, read his blog.

private static Regex regex = new Regex(@”<[w-_.: ]*><![CDATA[]]></[w-_.: ]*>|<[w-_.: ]*></[w-_.: ]*>|<[w-_.: ]*/>|<[w-_.: ]*[/]+>|<[w-_.: ]*[s]xmlns[:w]*=””[w-/_.: ]*””></[w-_.: ]*>|<[w-_.: ]*[s]xmlns[:w]*=””[w-/_.: ]*””[s]*/>|<[w-_.: ]*[s]xmlns[:w]*=””[w-/_.: ]*””><![CDATA[]]></[w-_.: ]*>”,RegexOptions.Compiled);

I must admit to having developed code, in the (now distant) past that had a similar high bus factor.  Nothing as terse as the above example, thank goodness, but something kinda close.  On two occasions, actually.  I look back and hope that I have learned, but I’m not certain that I have. 

The trick here is that I do not know the developer who follows me.  He or she will know some basic and common things.  The problem lies deeper… It is where my expertise exceeds the ability of a maintenance developer to understand my code… that is where the break occurs.

So how do we avoid this?  How does a good developer keep from creating code with a High Bus Factor?

It isn’t documentation.  I have been using regular expressions for decades (literally) and the above code is wildly complicated, even for me.  No amount of documentation would make that chunk of code simple for me to read or maintain.

Pithy advice, like “use your tools wisely” won’t help either.  One could argue that regular expressions were not being appropriately used in this case, and in fact, the blog entry describes replacing it because it wasn’t performing well when larger files were being scanned.  That isn’t the point. 

I would state that any sufficiently powerful technique (whether regex, or the use of an advanced design pattern, or the use SQL XML in some clever way, etc) presents the risk of exceeding the ability of another developer to understand, and therefore, maintain it.

Where does the responsibility lie for insuring that dev team, brought in to maintain a bit of code, are able to understand it?  Is it the responsibility of the development manager?  The dev lead?  The original developers?  The architects or code quality gurus?  The unit tests? 

Is it incumbent upon the original dev team to make sure that their code does not have a High Bus Factor?  If so, how?

I’m not certain.  But it is an interesting issue.

SOA: Introducing the Business Event Schema

By |2005-06-27T20:53:00+00:00June 27th, 2005|Enterprise Architecture|

We have an easy notion of the data dictionary: a description of the data at rest in a OLTP system.  But what about the data in motion?  That’s where the Business Event Schema comes in.

More than a simple XML schema, a business event schema is a description that contains the following elements:

  • The name of the business process that initiates the event.
  • The timing or frequency that this event will occur and the conditions under which it will occur.
  • The schema (an XSD will do here) of the fields and the data types for those fields.  In addition, a description of the domain values and their meanings in the context of the business event is required.
  • The mechanism by which this event will be published (to be used by consumers who want to gain awareness of the event).
  • The expected list of consumers of the event (not that the person doing the describing may not know, and probably never will know, the complete list of consumers.  That isn’t the point.  This is here to help communicate how this event schema can be used as part of a business process).

This is, IMHO, a key deliverable for any architect attempting to describe a business process and how systems that are involved in that process can be integrated with one-another in a real-time fashion.  This event-driven integration goes hand-in-hand with service oriented architecture, in that the systems involved are loosely coupled, with explicit boundaries, using well defined data schemas, and at a coarse-grained level of interaction.

Interesting problem in VS 2003 and how to fix it

By |2005-06-21T15:30:00+00:00June 21st, 2005|Enterprise Architecture|

A team member and I found an interesting problem yesterday that I thought I’d share.  We found the problem by luck, and the fix was weird.  Perhaps there is an easier fix out there.

The problem manifested itself this way:

We needed to build our five different components into different MSI files (don’t ask).  Each of the five components refers to one or two “base class” assemblies that are included in each MSI.  Previously, we had a single solution for each component that creates the assembly and then builds the MSI.  Most of the assemblies end up in the GAC.

We were running into problems where we would end up accidentially installing two copies of a base class component into the GAC.

Our solution was to create a single solution file that builds all of the assemblies and builds all of the MSI files.  This way, we could use project references and we’d only get one version of a dependent assembly in any MSI file.

The MSI for installing Assembly A is very similar to the MSI for installing Assembly B, because A and B are very similar.  They both inherit from the same base objects.  The problem was this: After creating the new solution file, and carefully checking every MSI, it appeared that we had it right: MSI-A would install Assembly A, while MSI-B would install Assembly B. 

We saved the project and checked it into version control.  Then ran our build script.  MSI-A would have Assembly A, and MSI-B would have Assembly A as well.  Assembly B was not included in any MSI at all!

Opening the project back up showed that, sure enough, MSI-B was defined to use the project output from project A, even though we specifically told it to use B.  Fixing the reference using Visual Studio didn’t help.  The moment we saved and reopened the solution, the MSI would once again show that it refers to the wrong Assembly.

The cause:

When project B was created, the programmer made a copy of all of the files of project A, and put them into another directory.  He changed the names a little and ran with it.  It never occured to him to open up the Project file and change the Project GUID for the new project.

The project GUID is a unique id for each project.  It is stored in the project file, but the solution files and the install projects use them as well.  Since we had two projects in the same solution that used the same GUID, then VS would just pick the first project with that GUID when building the MSIs.  As a result, we had two MSIs with Assembly A and none with Assembly B.

The answer that we went through was to open one of the two project files, in notepad, and change the Project GUID.  Then, go through every solution file that referenced that project file and change the referencing GUID value.  We had to be careful with our solution file that contained both projects, so that we left one project alone and added the other.

This worked.  The effect was odd.  I thought I’d post the problem and our solution in case anyone else makes the mistake of creating an entire project by copying everything from another project, and then putting them both in the same solution file.

Happy coding!

Does SOA create a new class of defect: passive-agressive behavior?

By |2005-06-04T17:06:00+00:00June 4th, 2005|Enterprise Architecture|

I was having a discussion the other day about the reasons for using SOA.  If the liklihood of defects in a system are logarithmically proportional to the complexity of the system, I noted, then SOA is useful because you can create a collaboration of interacting systems, where each system is as simple as possible, and some logic moves to the collaboration or orchestration between them.

To which my friend replied: so if a team has 10 members, and one is not functional, the rest of the team can adapt, but if a team has 10 members, but communication is screwed up, then the team itself is dysfunctional.  That’s worse.  So, can SOA create dysfunctional collaborations?  Can we create a “team” of systems that hate each other?

What if one system is best served by mistakes that show up in another?  Can that system engage in passive-agressive behavior with another system?  What about codependency?  Can two systems behave in a manner that is counterproductive to both, but makes both of them look effective from the outside?

Do our test plans need to start including common team dysfunctional behaviors as test scenarios?

Maintaining the ACID test in long running transactions

By |2005-06-02T11:22:00+00:00June 2nd, 2005|Enterprise Architecture|

I was reminded recently of the fact that long running transactions, especially those involving multiple databases, cannot be made to follow the ACID rules of database transactions.  On its face, this is completely true.  However, I’m thinking that there are mechanisms that could be used to allow the positive effects of ACID to remain, even when the actual implementation is not available in the automated manner we are used to.

As a refresher: A is atomicity (which means that the entire transaction has to occur or not occur… failure means to roll it back).  C is consistency (if part of a transaction breaks a rule, then the entire transaction fails), I is isolation (two people performing actions on the data should not affect one another), and D is durability (committed transactions are not lost when power fails or other adverse events occur).

So if a long running transaction causes a change in Database D1, then is transmitted to a remote system, where the next day, it affects Database D2, (where it could fail), then we lose both Atomicity (because the transaction was committed to D1 even before it is known to be successful at D2) as well as Isolation (since a user could ask both databases for info in the mean time, and get two different answers.

However, the positive effects of ACID come when viewed from the viewpoint of the user.  The user is not a concept.  He or she is real.  They have a goal and a purpose for using the database.  If you can present ACID-like interactions to them, then these flaws can be minimized.

In order to do this, I’d suggest that a “system of record” is kept seperate from the systems interacting in the transaction.  An interaction with the “system of record” would occur at the last step of the long running transaction.  That interaction would only occur if all prior interactions were successful.  All users who want the “correct” information would be encouraged to check there.  This gives you a kind of atomicity, since a change would not occur in this system until all parts of the transction are complete. 

Similarly to Atomicity, Isolation can be met from this location as well, since queries to this system would not return different results depending on the status of various transactions, until those transactions completed and updated the system.

So while long running transactions don’t meet the ACID test, systems that support and defect long running transactions can be set up to provide the benefits of ACID transactions fairly readily.