Quality Matters
Over the past year, we have worked on a new version of Tideway Foundation. As mentioned previously, we have introduced a systematic way of describing the patterns that occur in IT using a new Pattern Langugage, Tipple. Tipple is used by Foundation to figure out how to discover and model complex IT infrastructures - but in the end, it is of course just a tool in the toolbox.
What really matters is the end result, which in this case is relevant, stable and accurate information about the IT infrastructure. As you know if you have ever used any, the trouble with most tools that do this is that the data is presented as a fait accompli: the tools use a black-box mechanism to arrive at its results, meaning that it’s somewhere between hard and impossible to verify the quality of the data in a systematic fashion.
Designed for transparency
To overcome this fundamental issue, we designed complete transparency for the data Foundation produces into the new system. The end result is that the system automatically produces a comprehensive set of provenance links for all of the entities Foundation infers.
The significance of this is easiest illustrated by an example. In Foundation, the data is structured into several logical parts, the 3 most prominent being the directly discovered data (DDD), the inferred data, and the knowledge data in the form of patterns - illustrated in the graphic to the right.
The DDD consists of the “raw” data that Foundation gathers using a variety of techniques. The knowledge embedded in the Patterns is what causes Inferred Data to be created, based on the DDD. When an inference is drawn, the linking of the entities involved is done automatically and the result can be viewed in the User Interface, used in reports, or exported to other systems.
Below is a somewhat contrived example where we have scanned two IP addresses and have discovered a few processes and a file in doing so; this results in a set of nodes in the DDD area. Foundation has used this information (and the information stored in the patterns) to infer the existence of two Hosts that each run one piece of Software, as shown in the graphic. It has also inferred the existence of a Business Application that relies on the two pieces of software as well as some configuration information in the File that was found on the first of the two hosts.

All of the dashed links indicate inferences - these are stored in the model and can be used to determine what exactly caused an entity to be inferred and how the values of the associated attributes were deduced.
Managing the lifecycle
What is not shown in the diagram is equally interesting but harder to illustrate in a simple example: inference relationships are used to indicate the origin of a particular attribute value as well as lifecycle inferences.
The lifecycle of inferred entities is something that most discovery tools don’t discuss, but it is often the case that lifecycle events such as a host or a piece of software being removed from the model is information worth highlighting.
Again, Foundation delivers a robust representation of this: when an entity is explicitly deleted because a removal criteria is met, the corresponding node is marked as deleted, and a relationship to the entity or entities that caused the removal is created. This way, it is entirely feasible to browse the model for entities that have been removed from the model and at a glance see why - they are not actually deleted, merely marked as such.
IIS by example
In the screenshot to the right, Foundation shows an instance of Microsoft IIS v6.0 that it has discovered on a particular host. In addition, Foundation also shows the name of the maintaining pattern that was the basis for inferring IIS.
To see in more detail what is happening, the user can click on the pattern link (which will show a list of all of the entities it has helped model and the pattern itself) or he can use the Provenance information to show all of the relevant detail. In this example, it looks something like this:

There is a lot of information here, but it’s all fairly easy to digest. The “primary data provenance” shows a link to a discovered process that caused the pattern to be triggered in the first place. Each of the attributes also show their provenance - in this case, two of the attributes are from the pattern (which means that the value is a constant encoded here), and two other attributes are retrieved from the Windows registry.
Following the link to one of the RegistryValue entries shows exactly where the value came from:
Verifiably trustworthy data
Having all of this information available means that gaining complete trust in Foundation becomes very easy: everything it tells you is completely transparent. And because the mechanism that generates the inference information is a core part of the Tipple engine, it extends to new entities or attributes that is introduced in a new pattern and is not part of the standard taxonomy of Foundation - without any modification to the system itself.
The paradox of course is that this unique feature is likely to make itself mostly irrelevant over time as our users get used to the data just being accurate. I imagine it will then only really be used when developing or testing new patterns…

By tim.coote on 31 Aug 2007