Stateless is to service as commandless is to data

By |2004-10-25T13:57:00+00:00October 25th, 2004|Enterprise Architecture|

Abstract: This article provides a counterpoint to Pat Helland’s most recent article by describing the concept of “commandless data documents,” and how they are logically used in Service Oriented Architecture.


I was re-reading Pat Helland’s latest SOA article “Data on the Outside, Data on the Inside” recently.  It is a good step forward in the discussion of how data is managed in an SOA environment.  I believe that one aspect of the discussion is lost in this article and many of the others that I have read: Something I like to call “commandless” data.

In Mr. Helland’s article, he states:

In service oriented architecture, the interaction across the services represents business functions.  These are referred to as operators.  The following are some examples of operators:

  • Please PLACE-ORDER

I don’t get a chance to disagree with his articles very often, but this time, I must.  The above statement is often incorrect because it assumes that the document represents a command, and not a statement of fact. 

To whit: When a purchase order passes from a buyer to a seller, I assert that the document is NOT a request. 

A statement of fact

The sending system has already entered the document as State information.  It is not, in all liklihood, waiting for a response before booking the purchase order as an outstanding obligation.  No, the purchase order is not a request to purchase… it is a statement that the buyer expects the seller to follow through a contractual arrangement that will result in the transfer of goods and funds.   This statement of fact is already on the balance sheet of the buyer.  It already affects financials, projections, and, in all liklihood, customer-facing statements about the availability of products further along in the supply chain. 

Viewing a purchase order as a request assumes that the sender knows what must be requested, which means that the sender is sending a command to the receiver.  Viewing the purchase order as a statement of fact means that the sender cannot know what must happen from a system standpoint on the receiver’s end.

This is a minor shift in the point of view, but a very important one.

The fantasy of “Once and only once”

The first time the seller gets the document, the seller may (or may not) be surprised by it.  “Look, ma, someone wants to buy something!”  However, the behavior of the receiving systems is not “add” the purchase order!  On the contrary, it is “Does this order already exist?  If not, add it.”  That is because the order may very well already exist.  In fact, how can the sending system know that it doesn’t?

The systems of business have to allow for information to come from multiple sources.  There is no concept of “send it once and only once.”  People use telephones, fax machines, e-mail, online ordering, snail-mail, calls to the customer service rep, and indirect purchases (e.g. from a dealer) to indicate that product needs to move from place to place.  I call this an “open channel” because the same message may arrive, in different formats from different sources with the same content.  The sending system cannot assume that the receiving system does not already know about the transaction, nor should it presume to dictate what the receiving system should do with it.

Never a command?  C’mon…

That is not to say that documents must never contain a command, but they would be commands for information, not normally commands for functionality on the receiving end.  By this, I mean that some messages are, of and by themselves, useless without referencing an ongoing transaction.  For example, if I ask for the status of an order, I am referring to an order number for an order that already exists.  The receiving system would respond with information about the status of the order.  A list of response codes may include…

  • Already shipped
  • Shipment planned on schedule.  Expected ship date: 11/01/04
  • Shipment planned outside of expected schedule: Expected ship date: 12/01/04
  • Shipment accepted and vendor commits to fulfillment, but ship date not known.
  • No shipment planned.  No committment is made.
  • Problems detected in processing the order.  The order is on hold until manual resolution.
  • I’ve never heard of that order.

On the other hand, if I want to transfer payment from one account to another, my document does not contain a request for the transfer… it simply states that the sending system considers the transfer to be a fact, and would like the receiving system to confirm that it, too, believes the same to be true.

This way, if the same document is sent from the sending system to the receiving system 25 times, the transaction still occurs only once on the receiver.

Note: Acknowledgements are a special case.  If I send a document to a system, and get an acknowledgement back, I do not expect to get back the entire submitted document.  I would expect to get back my sending identifiers and the receiving systems unique identifier for the same document.  I would want to ASSUME that the document that I referenced as “Contoso transaction 12” and which was acknowledged as “Contoso transaction 12” was not changed by the receiving system before being acknowledged.  If this is not the case, then the protocol would need to handle these situations. 

I can respond to a question… how do I respond to a fact?

This ability to respond to a fact, rather than a request, is essential when creating a new system in the world of SOA.  This is because we need to be able to make statements that are detached FROM EACH OTHER.  Sure, some documents are going to cross reference (like my status request).  However, the system should function, and function well, without having to coordinate many of the incoming documents.

For this reason, if a purchase order arrives from a new buyer, the purchase order doesn’t need to be preceded with a “add new customer” order.  The purchase order establishes the customer if he or she doesn’t already exist. 

This is the concept of “commandless data.”  This is as powerful a concept for integration as “stateless transactions” are for scalable web page requests.  With commandless data, we get simplified interactions, scalable messaging, and protocols that are reliable, even when our networks are not.

This is why I love this stuff.

Using Service Oriented Architecture concepts for database replication

By |2004-10-21T15:57:00+00:00October 21st, 2004|Enterprise Architecture|

Is SOA really useful for database replication?  Yes and no.  This posting will discuss a dilemna that a project of mine faced, and some of the principles that I needed to share with the developers before they could see (a) the problem, and (b) the solution using SOA.

The problem we faced

Here is a specific problem.  Imagine that you have two seperate web sites, hosted in their own environments behind firewalls (in their own DMZs).  Machines in area A cannot route packets to machines in area B and vice versa.  No outbound communication, period.  There is a database in area A that needs to be synchronized with another database in area B.  These database are NOT identical.  They are in fact, parts of two different applications respectively.  However, there are some tables where rows originate in area A and they have to get to the database in area B, and there are some tables where rows originate in area B and they have to get to the database in area A.

(This is not as bizarre as it may sound.  In one of the dot.com companies where I was a co-founder, we had two hosting environments: one for public users and one for business partners.  They were hosted in different physical locations.  The systems had different SLA requirements and different security requirements… and they couldn’t route packets between themselves either.  Deja vu.)

So, how to do it?

Well, as you can imagine, there has to be a way for someone to communicate with the hosting server, and their is.  From the corporate network, I can access area A over port 80.  I can access area B over many ports, including port 80.  The traffic between my desk and these areas is entirely secure… it does not leave the corporate network and travel on the open internet at any time.  While the servers in area A cannot initiate traffic to me, they can respond to a request that I send in.  Same for area B.

For shorthand, and to make this discussion shorter, I will pretend that there is only one machine in each area, and I will call them Server A and Server B.  The concepts are the same.  Also, for the sake of simplicity, let’s pretend that each machine is both an application server and a database server.  They are NOT in the same Windows domain, nor does the domain in either side trust the domain on the other. 

Our first solution

We decided to create a little windows service that would run in the secure corporate network.  The service would make a request from Server A, get a bunch of data, and then pass it, as a request, to server B.  Server B adds it to the database.  The service would then go in reverse, and make a request from server B.  The service would pass the response to a web service on server A, which would store the data.

Seems pretty straight forward, and we weren’t planning to use SOA at all.  The request from A to go to B would start by requesting all of the rows from a specific table that were older than a specific time from a webservice on server A.  The response would contain a small number of rows (less than 10,000).  We would pass these transactions to server B one at a time.  We always ran them in a specific order (to get around any particular dependencies between the tables).

Let’s see a show of hands for everyone who can see the problems with this.

  • First off, all the data coming from Side A, and going to Side B has to be extracted in a single transaction.  This is because there are relations between the tables, and we can’t run the risk that, in between extracting data from a header table, and extracting data from a details table, someone could have inserted data into both.  That would make the data in the details table newer than the data we just extracted for the header table.   We’d get foreign key errors on the destination side.
  • Secondly, it scales badly.  Since the data feed has to occur as a large pile of data (see the first problem), we now have a batch process.  If the amount of data that has to be moved starts to really grow, we have to be concerned with the systems impact that occurs when this process kicks off.  Will the sites slow down?  Will they run at all?  We start going to 4am processes… which means our data can become many hours old (and usually will).
  • Thirdly, if we increase the number of servers, on either end, coordination gets to be hairy almost instantly, because a destination can either have a large file update, or not… which means that the results of a query on one destination is not going to differ from another destination by only one record… it’s going to differ by many records, possibly thousands of records.   

Here’s where I say: SOA fixes everything.  Well, it won’t give us world peace, but it does fix these flaws in a reasonable manner.

First off, to make life easier, we put a web service both on Server A and on Server B.  The web service would handle the job of talking with the local database server.  Now, we can communicate over port 80 from “replication central” on the corporate network.  This makes the security guys happy, because we don’t need other ports.  Also, the code in each web service is very similar, so it can be derived from the same code base, thus reducing the amount of code to test and maintain.

OK, so our infrastructure now has a database and a web service on Server A, and the same on Server B.  We have a windows service in the corporate network that can access both web services and make requests.  We will call this “replication central” or the RC.  One more important detail: the values used for primary keys, for the tables on BOTH sides, are GUID values.  We do not need to match against some arcane query on the destination side to determine if a row is already there.  This is all just component driven architecture.

Here’s where it becomes SOA. 

When the RC calls up server A, we won’t ask for a table.  We will ask for a business document.  The document will contain one independent header row and all detailed rows (and more… read on), in a single .NET object serialized to XML.  I can’t describe the content of this object just yet… not until we work out a few problems.

What rows do we send?

First problem is this: “how do we know what header row to send?” 

I don’t like dates for this kind of thing, so we are putting in “dirty flags” for all of the header rows.  The logic works like this: If the application updates any row in a table that gets replicated, we will find a header row (in that or another table) that should get it’s “dirty” field set to true. 

For example: let’s say we have an invoice header, and invoice lines, and each line can contain further detail lines (an invoice of clothing, where a line is “Docker Slacks” with sublines for a quantity broken out for each size and style).  Now, someone updates the quanity of plaid golf pants in Men’s size 42 from 60 pairs to 80 pairs.  The invoice header will get the dirty flag set, because when we send the business document from one place to the other, it is the business document that decides what detailed rows to pull in, including this one.

Domain tables

Another problem comes from domain tables.   Let’s say we have a “country” table, and a user enters a new country on Server A, and then adds a customer who lives in that new country.  When we send the new customer over the wire, we have to make sure that the new country gets registered first.  Should we have a seperate transaction for countries?  After all, they are independent values… or are the

I would state that the country table is not a business document.  It is a convenience: a data standard that all customers must adhere to, but not a business document.  Therefore, when we create transactions, in this case, we would not have a transaction called “country.”  Instead, we would send the customer record from Server A to Server B, and in the document, we would have all of the information needed to create a country record for this user in the event that this is a new country.

So a snippet from the XML could look like this:

        <Line1>123 Main Street</Line1>
        <Line2>Apt C</Line2>
         <Country ID=”{738A7639-C24D-42a6-8DB7-52C07B62781F}”
             Code=”USA” Desc=”United States”>USA</Country>

Notice that the Country tag has attributes that completely define the country of the customer.  That way, when the customer information is passed from one machine to another, we can check to see if the country “USA” already exists.  If it does not, we add it using the attribute data.  Otherwise, we simply use the key value in our customer record on the destination side.

One caveat to this: this idea applies to simple domain tables. It is not useful when an otherwise small transaction refers to a much larger one.  For example, if a transaction represents a request for the status of a shipment, the shipment itself should not (normally) be contained in the request.  In this case, the list of possible responses would include the response: “never heard of that shipment id.” 

Is it an update or an insert?

One common problem is that the data that comes over in a feed has an indicator that says “is this new data or updated data?”  My answer is “who cares.”  Really.  If a document exists on one side, it must exist, in exactly the same format, on the other side.  There’s no variation in this concept.  The source side has a single representation for an object, and it knows that some element, somewhere in it, has changed.  Given the concept of a single dirty flag on the header record, the source system is not actually able to say which field changed to cause the need for an update.  The document is simply dirty and needs to be sent again. 

An important notion is that the document does NOT contain any instructions for how the receiving end is supposed to process it.  That is up to the receiver.  The sender has no say in the matter.  For the sender to be aware of this creates coupling between the sender and the receiver, and that is not healthy.

How do we handle transactions?

In our situation, we can’t create a distributed transaction.  We have web services.  We have data in a document.  So, how do we know if a transaction has been successfully applied?  Go back to old EDI principles for this one: the receiver sends back an acknowledgement.  So the Replication Central (RC) service asks Server A for a document (just one).  In a stateless response, Server A gives up a document.  RC sends that document to server B, on a synchronous call.  Server B applies logic to decide if rows need to be added or updated (this is why the GUID keys are so important).  If it works, Server B replies with “Thank you” or “That didn’t work… and here’s why”.  (Architecturally, this could be done asynchronous, but doing it synchronous avoids the issue of handling call backs when Server B cannot send unsolicited messages to the RC.)

Now, the RC knows that the transaction was applied, but Server A does not.  So the RC makes another call to Server A, this time just to acknowledge the document and to turn off the “dirty” flag. 

The logic of the dirty flag is not boolean.  An item is not “clean” or “dirty”.  There are three states.  It is “clean”, “dirty”, or “in process.”  When the RC asks for a record, the web service on the database server needs to get one record that is marked as dirty.  It will compile a document.  Then, before sending the response, it will mark that record as “in process”.  This is still dirty, but there’s a catch.

If the user modifies a record when it is “in process”, we set the dirty flag back to “dirty”.  (the logic here is simple, since we are already setting the dirty flag when we modify a record… we don’t care if it was “clean” or if it was “in process”).  Note, we also need a date field for when the dirty flag value is changed (the reason will be clear in a moment).

The distinction comes when the acknowledgement is received.  If we acknowledge a record that is “in process”, we should set it to “clean”.  However, if we acknowledge a record that has been updated after the transaction was sent out (and is therefore “dirty”), it stays “dirty”.  We have to send it again with the new updates.

Why do this one at a time?

You may have noticed already that we are only moving one document at a time.  The logic that the web service should use when retrieving a record can be best described in Structured Query Language: SELECT TOP 1 * FROM HeaderRow WHERE DirtyFlag = ‘dirty’

If that query returns nothing, then (and only then) try this: SELECT TOP 1 * FROM HeaderRow WHERE DirtyFlag = ‘in process’ and (in English now) it has been more than an hour since the dirty flag was last modified. 

This gives you one record that “really should be sent” to the other side.  Why wait an hour?  You could make it a minute if you want.  I like leaving a window.  That way, if you start up four copies of your Replication Central service, they won’t conflict with eachother.  In other words, it scales nicely.

Also, since we are only sending one transaction at a time, then the RC can run fairly continuously.  We can query the database continuously until it returns with “no changes” and then sleep for 10 minutes before we ask again.  We don’t have to worry about “processing windows” or slowing down the server with the overhead of replication.

Final notes

A detail that should be obvious is that each transaction has to be uniquely identifiable.  Since we are using GUIDs for our keys, we can use the GUID of the header row as the transaction id.  That way, when the acknowledgement comes back, we know what row to apply it to.

One more note: I had said that data needs to go in both directions: from A to B and from B to A.  In our system, we are lucky… We never have the same table that can be modified on both servers at the same time.  I do not claim that this process will work with that situation, because data merging comes into play.  That said, the concepts can probably be expanded to cover data merging… but I don’t have time to think about that right now.


Service oriented architecture isn’t about web services, or HTTP, or XML.  It is about data.  How data is managed.  How it is transmitted.  And how it is replicated.  This document describes one, very tangible, mechanism that we are using to transfer data between servers in a service oriented manner.

How to encourage your outsourcing partners to avoid waterfall processes

By |2004-10-21T14:20:00+00:00October 21st, 2004|Enterprise Architecture|

Wrote an article a few days back and posted it here: http://blogs.msdn.com/nickmalik/articles/243442.aspx

This article is directly aimed at folks who send software out to an outsource agency or overseas development shop, where the code is written and delivered back to you.  The article contains specific advice for how your RFP should read, or what your statement of work should say, to encourage the outsourcing vendor to use agile development methods instead of a waterfall software development lifecycle (SDLC), which nearly always fails.

Agile Vendor Management – removing waterfall from outsourced projects

By |2004-10-16T16:54:00+00:00October 16th, 2004|Enterprise Architecture|

The Tyranny of Waterfall in RFPs

When I was in consulting, we would routinely bid on RFP (Requests for Proposal) issued by government agencies and large corporations where the client would require the project to be performed in the worst possible model: Waterfall. (If you aren’t familiar with the Waterfall Software Development Lifecycle Model, you are probably using it.)

In this model, all analysis is done before any design… all design is done before any construction… all construction is done before any deployment… and nothing ever ships on time or within budget. These projects are very difficult to properly estimate. A project size grows, so does the risk of delivering a late project with major features snapped off.

This was a problem as a consulting company, because we were routinely asked to deliver software for a fixed price with fixed requirements (fixed bid contracts). As a result, we were between a rock and hard place: asked to use a software development model that nearly guarantees failure, yet accept all the risk for failure.

Is it any wonder so many consulting companies go out of business.

How did we get here?

How does a bidder force a waterfall model in an RFP, you ask? It’s easy! Simply tell your vendor that they must first deliver the analysis document, then the design document, then the code. In that order. Even better, require that each document must be formally approved by the agency before the next step may begin. (this is a recipe for suicide.)

If this nearly always fails, why do it? It’s part of a vicious circle that comes from failure and lack of trust.  A project is described pooly and a consulting company, eager to get the bid, responds with a fixed price that is risky.  The project fails to deliver value on time or within budget.  So, as a way to cut risk, the business or agency asks for something of measurable value for each payment.  That’s the idea anyway.

The problem is: what artifacts of software development have measurable value?  In the waterfall model, we would state that the analysis is valuable.  We put a dollar figure on it, and manage it as a project.

The same with the design.  It is documented to the hilt, with excessive details dealing with everything from architecture, to message definition, to class diagrams, to statements of how performance can be achieved. 

I won’t be the first person to tell you that this process doesn’t work.  So if the process doesn’t work, how are these documents valuable?  The idea is that you could have company A develop the analysis, company B develop the design, and Company C could use them to develop the application.

This is nonsense.  There are two reasons why:

  1. Forced rework and wasted energy.  You cannot overcome the flaws of the waterfall model by introducing competition.  It just fosters the disease.  If the analysis is flawed, then the design team will either have to recreate it, or will produce the wrong design (lose – lose).  Same goes with construction.  The only hope that a company has when they accept a contract of this kind is to build in the costs of performing the analysis and design again under the covers.
  2. Nothing has been developed.  The things that really bite a project are the things you don’t discover until later, and you cannot discover them by delivering a document.  As a result, this process simply pushes all of the changes (and their costs) to the point when the first deliverable is made, usually half-way through construction.  It is too late to avoid cost overruns at this point.

Therefore, if these documents fail to deliver a working project, then your controls don’t work and the project fails.  This engenders more distrust because the agency thought that the vendor knew what they were doing… so they increase oversight and set up payment terms based on delivery of these artifacts, in this order.

Setting up contract terms means assigning a dollar value to each phase.  That creates an interesting question.  If this process is a sure road to failure, what value should the signposts of failure really have?  Not enough to pay for, I’ll tell you that.

So why are most software development RFPs out there written to not only encourage this practice, but to create cash flow problems for companies that want to avoid it?  It occurs to me that many of my friends in the government and large corporate circles may simply not know of another way to manage a vendor.

My response to darkness: light a candle

Now that I’m on the buying side (again), I’ve crafted a new list of requirements for a vendor to fulfill. Of course, this list will be vetted by my organization. So, I decided to put a vanilla list out here, on the web, as a way of providing information for my former clients and other interested parties to see.  I fully expect that my organization will change this list, so this is my way of getting my opinion out unaltered.

The message I want to share is this: provide these criteria to your vendors, instead of the arcane requirements of the waterfall model, and you will encourage your vendors to follow BEST practices, instead of WORST practices… that will make a much bigger difference than you can imagine.

First off, let me say that not all vendors perform all tasks.  I am breaking down the tasks according to analysis, design, construction, and delivery, because it is the model that my former clients are familiar with, and because some vendor companies cannot do all of the steps.  So, when reading each section, realize that each section stands alone.  It can apply to a vendor hired to fulfill that role alone, or to a vendor that fulfills that role as part of a larger process.

However, if you do have one vendor creating requirements and another doing design, and potentially a third doing construction, you have to provide the facility (desks, computers, phones, server software licenses) for them to develop all of the artifacts in one place, because they will ALL be working AT THE SAME TIME.

This is absolutely key to success: the primary effort of analysis is not done until design has released an iteration and code is being delivered, and it has no positive impact if the developers don’t meet, know, and trust the analyst(s).

Agile Requirements Analysis

If a vendor provides the analysis team, then add descriptions to this effect to your RFP.

The vendor will deliver a requirements document containing the items listed below. The requirements document will be delivered more than once, in iterations.  With each delivery, it will be updated to reflect changes. Changes must also be tracked using a formal change management process. Note: The level of detail for each use case must match the use cases delivered. Therefore, if a code drop contains use cases 1-7 + 12 and 14, the we would expect that these nine use cases will be detailed and described in the requirements document re-delivered with that drop.

The requirements document contains:

  1. A use case survey describing every use case envisioned for the final system (one paragraph for each use case in the survey, no more than would fit on a 3×5 card).
  2. Detailed use cases for every use case to be delivered in the system. See Alastair Cockburn’s book on Writing Effective Use Cases.
  3. A detailed list of actors and roles that are used by the use cases, with defining characteristics for each actor.
  4. A list of supplemental functional specifications, to include: Accessibility, Security, and Privacy a
    s well as a discussion of the requirements surrounding: Availability, Reliability, Service Level, Scalability, and any Code Re-use requirements that were specified by the users.

  5. A structured glossary and list of business rules defining concepts, formulas, variables, constants, and conditions that control the behavior of the system.

The use case survey (step 1)  will be delivered before the design document.  However, as each use case is detailed, it must be delivered to the design team as part of a weekly deliverable package.  The design team is responsible for delivering the design components to the analysis team and the construction team in a joint weekly meeting.

The analysis team will meet regularly with members of the client staff, design team, and construction team.  For larger projects or remote projects where the project manager may not be able to attend, minutes of these meetings must be produced and filed with the project office showing attendance and results.

If the analysis team begins a use case, it must deliver results for that use case every week until it is complete.  The results must be understandable and useful, but do not need to be complete.  The analysis team must be aware that design and construction will begin immediately after the first iterative delivery of a use case.  Therefore, the use case should contain enough information to assist the design and development team as much as possible.

The vendor performing the analysis should expect that analysis activities will not complete until well after the first stable code drop is accepted by the customer.  However, the headcount needs will diminish dramatically between the first proof of concept deliverable (at peak) and the release of the first stable drop, when a small fraction of the expected team would be required.

How not to do analysis as you would in a waterfall project:

If you notice from my text above, we don’t require all of the analysis up front. In fact, don’t require that any of it is delivered before the first code delivery. That’s right… code first. Caveat: that code should be “proof of concept” code… completely disposable but eminently useful.

By letting your consultants write code before all analysis is done, they flush out further requirements, discover snafus in their ideas, get infrastructure in place, and push risk from the system, early. Wasn’t that the reason we wanted all this stuff up front in the first place?

Agile Design by Vendors

If a vendor provides the design team, then add descriptions to this effect to your RFP.

The vendor will deliver a design document that contains the following items. The document will be re-delivered with each iteration, updated to reflect changes. Note: it is not important for this document to be 100% complete until the code complete milestone. It is, in fact, preferable, that only portions of the document are delivered with each iteration.

  1. A logical data model and data dictionary illustrating the data types, ranges and the relationships between important entities.
  2. A high-level component view, showing individual components and indicating which components communicate with each other
  3. For systems involving multiple servers, a deployment view, showing which components are deployed on which servers in a fully scaled environment.
  4. Depending on the complexity of the components: a layered component view, diagramming which classes or modules exist in each application layer or tier.
  5. A short description of the responsibility of each component.
  6. A short description of the interfaces that components use to communicate. Detailed descriptions are required of any XML or flat file formats used to encode and transmit data.
  7. Screen shots or wireframe diagrams if the user requires them, illustrating how the functional interface will be laid out.
  8. For web applications, an interaction diagram will be provided, illustrating the navigational flow that a typical user will use to flow through any groups of sequentially ordered pages (like wizards, form fill-in, and payment/checkout).

The goal of design is to progressivly develop an approach for describing the components of the system,  their responsibilities, and how they interact.  Therefore, with each delivery by the analysis team, a series of diagrams will be produced by the design team (within a week) that illustrates how the newly delivered requirements affect the current expected design.  The first iteration of design begins after the use case survey is delivered.

The results of the design effort are presented at a weekly joint meeting with the developers, the customer and the analysis team.  The goal is to flush out misunderstandings and see if the customer is likely to approve of the technical approach.  After the most useful use case (as judged by the client) is delivered to the design team, and the design team develops diagrams to support construction, construction will begin.

The vendor performing the design should expect that design activities will not complete until well after the second stable code drop is accepted by the customer.  However, the headcount needs will diminish dramatically between the first stable code drop (at peak) and the release of the second stable drop, when a small fraction of the expected team would be required.

How not to do a Waterfall design

Similarly to how we did the analysis, we will require that the vendor produce a design.  However, we will not require all of it at once. In fact, we do not require more than the mere framework to be delivered until a period of coding is complete (often a month of coding). Then, ask for all sections to have some detailed content. With each iteration, EVERY section should become more detailed until that section is completely specified. (It doesn’t matter which section is completed first, just that they are all completed along the way).

Also, require that the developers doing the code are updating this document, but that they are not spending more than 10% of their time on it. That means, in a typical month, they would spend no more than two days working on the document. The rest of the time is to be spent writing code.

What if that’s not enough time to do a “good job” on the document?

Guess what? The document is supposed to be a communication mechanism to allow developers to speak to each other… not to you. If the developers finish writing code, and didn’t need the document to have more in it, then the document is good enough.

Really. It’s OK to write down only what you need to convey and communicate.

Of course, the closer your users can get to the development team, the fewer words you have to write down. Users nearly always do a better job of explaining what a system should do than developers do. After all, they are paying for it, using it, and counting on it.

What to require during Software Construction

If the vendor is involved in the construction of the system, then place text to this effect in your RFP.

During construction of software, the vendor will be required to adhere to the following practices:

  1. All code will be delivered with unit tests using either the NUnit, JUnit, or CPPUnit unit testing framework. T
    he unit tests will exercise every step of each delivered use case that is part of the “normal case” or “sunny day flow.” Unit tests will test the handling of expected exceptions. Unit tests will verify the validations applied to the user interface by invoking assemblies used by the user interface. It is not necessary for unit tests to verify the mapping between object properties and database columns.

  2. All code will be checked in daily and will be compiled at least twice a week. This is under the contractor’s control. However, the client may require build reports to be transmitted for each build as a verification that this is occurring.
  3. As a best practice, the vendor should automatically invoke the unit tests for all checked in code immediately after the code is compiled. The results of the unit test, if this step is taken, should be included in the build report. It is not important that all unit tests pass in each build. Any failures that accompany the delivery of demonstrable code should be documented, so that we can verify properly installation.
  4. The vendor will deliver a version of the application with demonstrable functioning code at least once a month. The version does not need to represent a complete set of features. In fact, we would expect that it would not, especially in early stages of construction. The vendor will demonstrate this code, on client servers, to client customers and IT personnel, within a week of each delivery. The vendor will be prepared to describe the functional additions to the system, or the refactoring that took place during the previous month.
  5. Each iteration of deliverable code shall use a Setup project to install it. Copy and Paste installation is only acceptable where there are serious and known issues with using an Installer.
  6. Each iteration of deliverable code shall include a full copy of the source tree.
  7. At least once in every month, the vendor will perform a design review of code being delivered during that month, with at least one client employee present either by NetMeeting or in person. The design review may cover topics in data architecture, class architecture, module architecture, messaging, reliability/scalability, security/privacy, accessibility/usability, as needed by the application team during that iteration.
  8. The vendor will either follow the client’s coding standards document, or will deliver a coding standard to the client for review and feedback before commencing the construction phase in earnest. The code that is delivered must match the standard as agreed. If the code standard originates from the vendor, and the client determines that the standard is not sufficiently detailed, or does not sufficiently protect good coding practices, the client may require the vendor to conform to their coding standard.

How to avoid Waterfall construction

You can see the iterations in this process, much more so than in the Analysis and Design phase. That’s because we assume you have to do SOME analysis and SOME design before the construction team delivers usable “final state” code. This means that each iteration contains an update to those documents. Code, on the other hand, doesn’t get done until it’s done. As a result, the iterations are much more stark.

All the same, each iteration must be time-boxed.  There is a fixed (short) period of time in which to deliver. If you have to choose between shipping all features and shipping on time, drop a feature. Pick it up in the next iteration. Do Not Slip.  

Even more importantly, each iteration must contain a representative of every deliverable artifact. In other words, if you are planning to deliver an online help document, you should deliver one with each iteration, even if the first three times you deliver it, it doesn’t contain that many articles.

Don’t be tempted to say “it’s a short project… we only have time to deliver once!” If the project lasts more than a week, you have time for multiple deliveries.

Remember: deliver deliver deliver

Intelligent Software Stabilization

If your vendors are involved in the final delivery of the software system (nearly always true), then place text to this effect in your RFP.

The construction phase transitions to stabilization when all use cases required for the delivery of the system have been coded to completion.  This determination is made at the sole discretion of the construction team in conjunction with the customer. 

At this point, one more full code iteration will occur for the construction team to stabilize the code base and refactor out inefficiencies and correct mistakes made in the implementation of patterns and algorithms.  The functionality at the end of this iteration is not expected to change much from the prior iteration.  This is placed specifically for the construction team to improve quality, fill out remaining (boundary and exception) test case code, and make the system more resistent to events that could cause failure.

The delivery of the first drop in the stabilization phase will occur to the client systems at this point, and user acceptance testing will commence.  The users understand that the system may not be stable at this point, and the users will have been involved in seeing code releases before this point.  However, this phase will allow for minor (cosmetic) changes to the functionality as well as the repair of defects. 

All testing is expected to be conducted on the client’s site, and will involve client resources in individual contributor roles.  It must occur on hardware and software that is a good model for the equipment and configuration that will be used in a production environment.  The stabilization phase will include the following specific deliverables:

  1. All configuration management will be handed over to the maintainence team (client IT resources) who will be provided documentation and training on how the system works, how to fix expected errors, and how to troubleshoot issues.
  2. All user documentation will be delivered to the client and at least one training session will be conducted (for train-the-trainer exercises) or more (for train the user exercises). 
  3. Performance, Stability, and Scalability testing results will be delivered to the client showing that the system will operate within the parameters defined during the analysis phase.

The system will be considered complete after the users accept the functionality as being “correctly interpreted from the user cases as delivered”.  Note that any expensive changes that are first raised during user acceptance testing will be considered a change request if the client organization could reasonably have raised the same issue during one of the demonstrations held during the construction phase. 


The text of an RFP is wildly important to how the entire project will be run, since it drives financial terms, measurement criteria, and even decisions on the part of experienced consultant teams on whether to even bother bidding on the contract.  If the description of the terms of software development are carefully described to encourage agile software development processes, then the client company and the vendor will both benefit from a more reasonable and more orderly process, where information is collected early and presented often in a valuable manner.

Workflow patterns – so much more left undone

By |2004-10-15T19:08:00+00:00October 15th, 2004|Enterprise Architecture|

I have been following the progress of Dr. Wil van der Aalst in his efforts to create a patterns language for workflow processes, as you know if you’ve read my posts.  First, the workflow patterns were described, then a comprehensive comparison of different workflow systems with respect to the patterns, and most recently, the YAWL effort to create an open source workflow solution that encompasses all of the patterns.

What a teriffic thing this is!   YAWL implements all of the identified patterns… but the list is incomplete.

With no intent to criticize, I respectfully submit that the patterns identified are useful as atomic building blocks, in that they represent all of the patterns that exist at the Work Step level of abstraction (and somewhat at the Business Process level).  However, there are additional patterns at the Business Process level that are not identified (I can easily envision a few) and all of the patterns at the Business Unit level are completely absent.

What this means is that there’s more work to be done.

For example, at the business unit level…

Just a reminder for folks who don’t care to jump down to my earlier blog on multiple levels of abstraction in workflow… The business unit level of abstraction exists to show the collaboration between different business units (possibly within a company, possibly with units in other companies).  The diagram here is multi-threaded with the distinction between “context documents” which establish the context for a conversation, and “messages” which reference and require the existence of these context documents.  At the business unit level, we are not concerned with what goes on within a business unit.  However, a message from the unit may appear and may be sent to a partner, which can drive effects (that would be specified at the business process or work step levels).

So, one workflow pattern not included in the list would be a “rollback with penalties” pattern (my term).  In this pattern, a message arrives to a business unit that is currently in the midst of executing a workflow pathway… (the message arrives to the entire unit, not just to a point in the specific pathway).  This pattern exists if the message causes the workflow in process to roll back activities within the unit, run a special process to incur a penalty, and then potentially branch to a different business unit.

A good example would be: a customer commits to purchasing a passenger plane from AirBus.  Six months before delivery, AirBus has already constructed some parts, and is awaiting the arrival of other parts, the customer calls and cancels the order.  AirBus, according to its contract, needs to absorb the plane parts into other planes, and will inevitably charge a hefty fee for cancelling the order. 

Certainly logic of this nature is quite readily expressed in the YAWL language by combining a long series of atomic structures.  That is appropriate for the Work Step level of abstraction.  However, complete discussions can be held to deal with the implications of this kind of enterprise-level pattern completely seperate from the details of how it is implemented using Splits and Joins. 

This is part of the reason why the Gang of Four book on design patterns was not the end of the discussion on object oriented design patterns. It was, instead, the beginning.  In addition to numerous additional patterns at the detailed level of abstraction where the GoF book performed, a series of influential books appears in the following years detailing the patterns at different levels of abstraction (at the enterprise level, at the application messaging level, and at the detailed package level).

This is also the reason why the patterns work in workflow must now continue, to identify patterns at these additional levels of abstraction.

The infancy of workflow diagramming standards

By |2004-10-05T17:23:00+00:00October 5th, 2004|Enterprise Architecture|

I did something foolish recently… I criticized someone for an analysis diagram that, I felt, didn’t use “standard” workflow notations.

Granted, the diagram looked very different from the kinds of diagrams that have been coming out of workflow tools, and it definitely wasn’t compliant with UML Activity diagrams, but that still didn’t give me the right to be critical.  How can I tell someone to “follow a standard” when a standard doesn’t exist?

There are some links to an emerging standard, however, that I would like to share.  First off: the Business Process Modeling Notation (BPMN) specification (currently in v1.0 draft status) is an open source initiative to create a common diagramming standard that (a) doesn’t have the limitations of UML as it applies to human collaborative workflow, while (b) provides a good standard for business process modeling that can be mapped to BPEL.


On that site, you would find a good (fairly short) paper comparing BPMN and UML Activity diagrams written by Steven White of IBM.  I heartily recommend this paper.

One problem that I have with diagrammatic standards to date, including BPMN, is the lack of clear determinism in the diagrams themselves.  Unfortunately, I can’t level my criticism in this blog, very easily, because I haven’t quite figured out how to embed an image in the blog!  I will, however, try to explain.

In Code, we can show the beginning of something, and the end of something… we call it scope.

public void myroutine(int param1)

{ // myroutine scope starts here

   if (condition == true)

   {  //if scope starts here

   } //if scope ends here

} // myroutine scope ends here

However, in workflow diagrams, this is not so easy to do.  Workflow diagrams are graphs.  Pathways can snake through a diagram without respect to starting points or ending points.  Process A can branch out to two parallel processes, B and C, which then merge back together in process D.

This looks good in a small excerpt.  However, if we put in semantics, and a real world scenario, we can end up with lots of hidden rules.  For example: let’s say that we have the process described above (A -> (B or C) -> D)… Now, let’s add the workflow pattern of a synchronization… this means that process D cannot begin until both processes B and C are complete.

That’s all well and dandy, but what if B loops back around and merges with A? Would this be legal? 

If B calls A, and when A is done, it is expected to use some form of “return” notation to denote that we should return to B, and then continue on to D?  Is this legal? 

The reality is that there is a scope for the parallel split that occurs when the workflow process branches from A to both B and C.  That is the scope of the join at the end (from B and C down to D).  That scope needs to have properties, to be respected, to make sense.  The engines need to know what processes are related, what processes are required to complete.

The diagramming standards provided don’t enforce that sense.  There are no constraints. 

As a result, it is trivial to take a workflow model, as diagrammed in the BPMN and create a structure that works very well… then add a perfectly legal link and create a structure that works very poorly or not at all.

In software development, we learned this lesson the hard way, in 1968, when Edsgar Dijkstra published a paper titled “Goto Considered Harmful” (see http://www.acm.org/classics/oct95/).  This is one of the key triggers to a series of changes in language innovations that led, ultimately, to structured programming.  I would posit that without the “culture” of questioning language design that arose out of this transformation, the innovations that led to Object Oriented development, and now Aspect Oriented Development, would not have been possible.  Constraints are good.

Of course, I am no expert in UML 2.0 or in BPMN, so it is quite possible that there is a rule, somewhere, that says that my assertion would not be legal.  I have not seen it, and if it exists, I encourage my fine readers to point it out to me. 

My concern is that we have created two standards for workflow notation (BPMN and UML 2.0) that are no different in their fundamental constraints than BASIC and FORTRAN were… they both appear to be at the same level of evolution, where it is still more important to diagram what can be done, than to constrain the modeler from represeting flows in a way that should not be done.

B+ for effort.  C- for results.