Configuration Management
Monday, November 29, 2010
I spot an opportunity for Business Analyst – Configuration Management with an investment bank.
Configuration Management is not, as far as I can see, particularly difficult in theory – but I bet it's a bugger in practice: it's all very well if everything has been under configuration control since Day 1 (unlikely) and it would all be quite well if everything was easily identifiable (it isn't), and it would be, well, surprising if all the things whose configuration needed to be managed were known in advance.
The shade (wishful thinking?) of Donald Rumsfeld speaks: the list of things whose configuration needs to be managed is one of those known unknowns.
Take a for-instance: years ago when I was a simple hardware engineer, I worked with a slightly dubious circuit design that included a monostable – a thing that produces an output pulse of defined width upon receipt of an appropriate input pulse (Such devices are generally not best practice, but there were mitigating circumstances.)
Anyway, a new batch of boards didn't work; investigation eventually revealed this monostable, a 74LS221 I think, to be at fault. The design called for an LS221 and, since all devices with the same number were all supposedly interchangeable, regardless of manufacturer, the configuration control did not control the chip manufacturer – and this was a degree of freedom to be exploited by purchasing in minimising the component cost.
It emerged eventually that the chip design/implementation was faulty, so that particular manufacturer was scratched as a supplier for certain classes of device... and then later the same fault turned up in a device from a different manufacturer, who had either licensed the (faulty) design/manufacturing mask or was re-badging the product.
So, even with very tight configuration management, whether you use white-lists or black-lists, there's always another unknown waiting to catch you out, one of those inevitable unknown unknowns.
The problem is the same regardless of the scale of the systems under configuration control: PC manufacturers will sell PCs with identical part numbers containing different but supposedly functionally equivalent components, except that they're not always equivalent: I have personally seen Microsoft Excel behave quite differently on two ostensibly identical groups of machines, which did in fact differ in their internal chipsets. Different chips can mean different device drivers... how on earth does one efficiently manage the configuration of systems that are superficially identical yet may – or may not – actually be identical in all material respects? This is one of the real challenges of configuration management.
And before you nod off with boredom, here's one of the biggest challenges of configuration management: Disaster Recovery.
Disaster Recovery often relies on having duplicate systems ready to take over when primary systems fail: it's all very well having robust and fault-tolerant systems, but what happens if a critical business system is taken offline suddenly, whether as the result of a natural disaster or not?
What business wants to happen is for all data, transactional information, etc. to continue to be available on a backup system that is identical in its operation to the "live" system; some degradation in performance may be acceptable (depending on the magnitude of the disaster) but the essence of DR is that business should continue pretty much "as usual". However, if this is to happen reliability – as opposed to just once on acceptance of the DR solution – changes in the live systems must be reflected in the DR capability... and there's the rub.
What should happen, but doesn't necessarily happen, is that system configuration changes should be reviewed not only for their general suitability, but also specifically for their ability to be replicated in the DR setup. I personally noted on very large government system that was set-up and managed in such a way that, whilst the DR solution was sure to work when first implemented, there was no thought given to the need for reflective configuration management, live-backup cross testing or managed retesting of the DR solution.
The common experience of the home IT user, that when the hard drive on the desktop PC fails you suddenly realise that either data backups haven't been made for too long or, if they have, that the configuration has changed so much restored data is difficult to re-use effectively, is easy to replicate on vastly greater scales in business.
However, all one needs to know is that configuration management is not only about recording the status quo, it is also always about ensuring that the status quo can be reproduced. It does introduce new factors to Change Reviews and Implementation Plans (never, ever implement a change until and unless a matching change has been design and tested on the DR system... and the two implementations can be synchronised) etc. but it can be done... if one is careful to bear in mind the purpose of Change Management, rather than unthinkingly following existing processes ad infinitum.
Every time something changes, Change Management may need to change too...