I like the term “foundational technology”. It sounds so much cooler than it is. I have no idea if there is one and only one definition for it. My definition, in this post, is this: A foundational technology is a technology that provides infrastructure for another technology.
In this context, we would say that a document management system could require a relational database system as a foundational technology, because all of the document management systems that I know of store some data in a relational database. (I’m sure there’s an exception to that statement somewhere… it’s an example… please don’t flame me).
And now to the meat: In my opinion, document management is a foundational technology for most workflow systems.
This means that workflow systems are NOT part of document management, but good workflow requires at least some element of document management in order to function properly.
This means that workflow should be built on the concepts of document management, but workflow does not, of and by itself, represent core functionality of document management. It is in addition to it.
In an application where the records are actually in a database, and not stored in stand-alone documents, this means that the workflow process will treat the records as documents. The workflow process should not be dealing with relational intricacies like many-to-many relationships. As far as the workflow process is concerned, the data that it deals with is self-contained and self-descriptive. The fact that this data may be managed in a relational database system is entirely unimportant from a workflow standpoint.
(In the engine I am working on, this is enforced by requiring the application to extract the data from the database and pass it to the engine in the form of an XML document. The same document comes back from the workflow engine, potentially modifed. The app is free to break it back out into records and update the database.)
So, what elements of document management are required for workflow to “work?” I refer to a “managed document” to mean a document that is within the operational boundaries of a workflow management system.
Managed documents are self-contained and self-describing. This means that any relationships to outside data (vendor numbers, product codes, country codes, etc) are not important for the processing of the document through the workflow itself. If data is required for the processing of a document, then the related values should be brought into the document, and decisions should be made not on the coded foreign key, but on the business value from the lookup table itself.
Managed documents are persistent. Their lifetime is determined by business value and business rules, and they nearly always remain in existence throughout the life of the workflow process (at least) and often much longer. This stability is expected for a system that needs to manage the information about an item over potentially long periods of time.
Managed documents are individually addressable. While the name of a document does not need to be unique, the name of the document within a “directory” (or document collection) would be unique. The system has one clear way to refer to the current version of each document. (Prior versions are perfectly normal. However, managing prior versions is not a workflow function, and the workflow system is not normally required to be aware of it).
One version of the managed document exists at any one time. This is an important one. Most document management systems do a fairly good job of allowing a user to check out a document for modification, and then check it back in with changes. This can occur in “groups” of documents (where all documents changed should be treated as a single change for the sake of check-in), or it can occur for single documents. However, there is one “master copy”. After each operation, the system reaches a stable state by ending up with one master copy (even if a merge operation is required). In a RDBMS system, we say that we want our transactions to be atomic. In a DMS, we want to have only one version of the document “present.” (This is why relational DBMS systems have proven so useful for document management… they dovetail nicely).
In an environment where these rules can be followed (most applications can be fit to these criteria), workflow systems are fairly simple to implement. The engine that I am working on would have no difficulty tying to an .NET application that can meet this notion of document management, even if it doesn’t actually manage documents.
One thought on “Document Management as a foundational technology for Workflow”
Very thoughtful and interesting post. I’ve spent a lot of time working on and around this topic and I can easily say that you probably did the same. It doesn’t matter whether call them documents or just content, the part of "management" is pretty much always the same.
I would add one more bullet to your lis:
Managed documents are relational. Relationships between a document and its related content (other documents) must also be managed in order to sustain the first point — self-containment. If your system can’t understand the relationships, it can’t provide boundaries of containment. Workflow system can not function unless the boundaries can be clearly and consistently determined.
I guess you may argue that self-description implies description of relationships as well, but in my mind those are two different things.