We often reflect on how technology develops at an amazing rate, but even those of us who live in a world where Moore’s Law defines the timeline will have found the change and upheaval of 2020 a shocking and disorienting experience.
Much has been written about the pivot to on-line-everything and the emergence of a new normality. New models of teaching, research and assessment are implemented at break-neck speed and the systems and processes that underpin these activities have to adapt. This raises profound challenges for our data and the way we use it.
What does data do?
Data exists in systems in order to describe the real world. The things in our world and the relationship between those things become our data models. The measurements and classifications that we use to describe these things become the individual fields in our datasets. The ability of our data to handle change is a critical issue but sometimes we realise this too late.
The art – or is it a science – of data modelling is a critical part of the data lifecycle though it is often given scant attention. Data modelling brings together the requirements of the system that will use the data and an analysis of the world (or ‘data domain’) that the data needs to describe.
The world can be a surprisingly complex place yet business processes usually operate using relatively simple rules and concepts. When approaching how we define our data the balance between the requirements perspective and the domain perspective often becomes a question of what level of granularity and detail is written into the data specification.
There is no universal answer to this and a good data specification will strike a balance between building the logic of the system into the data specification and defining granular data building blocks so that the logic can be constructed in the system code.
Managing change in data
Over time things will change – changes in the domain that the data is describing and changes in the rules and logic of the business system that interact with the world. The pandemic has unleashed change in a way that is both rapid and highly unpredictable. Managing change under these circumstances is immensely challenging and yet critical to the continued delivery of services and information; getting it wrong can result in outcomes that might be interesting and exciting but they are never good.
Much has been written about the need for good data management and many providers are now developing capabilities in this area. Change management is a critical aspect of broader data management and one that often crosses the boundaries into systems management and business process re-engineering.
Good data change management depends on an approach and mindset that is disciplined and methodical. A thorough understanding of the need for change is critical to establishing the optimal approach and whether the scope covers data, systems, procedures or all of these. There is often more than one way to resolve a change and the way we define our data – focused on business logic or the detailed description of the data domain – will influence the approach.
The impact of change also needs to be established at the outset. If we change the data to support changes in operational processes what effect will that have on the downstream uses of that data such as internal analysis and external/statutory reporting? Evaluating change options should consider the entire lifecycle of the data before an approach is agreed.
Experience tells us that quick fixes often prove painful in the long run as unintended consequences wreak havoc on previously stable operations and make any future change more complex and costly. Simply repurposing a field, or an entry in a field, will often provide a short-term solution followed by a very long-term headache.
The existing nature of your data structures will have a significant bearing on the impact of change. As a general rule of thumb, the more granular your data specification, the less likely the need to change it. Granular data structures tend to load more of the business logic onto the system code so change activities are more likely to be about system changes than changes in underlying data specifications.
An optimal change management process will also have robust version control procedures. We tend to think of version control as something that applies to lines of system code but the principles of a good version control system apply equally to data models and the specification of individual fields. The ability to track changes in data specifications across the entire life of a dataset might seem like over-engineering a simple issue, but the absence of this type of control raises all kinds of hidden risks and costs as future colleagues attempt to understand the relationship between datasets produced to different versions of the specification. This raises a more fundamental question about the management of metadata and a good data dictionary – but that’s another topic for another day.
How well are you doing?
Change management in data specifications is a large and complex topic and there are no formal mechanisms for assessing capability. But there are a few simple self-test questions that you could try. Do you have a defined process for managing changes to data specifications? Do you have a central record or repository of data specification changes? When considering a change to a data specification do you analyse impact right across the lifecycle? Has your institution ever experienced the pain of unintended consequences resulting from data changes? I would be fascinated to hear about them.
Andy Youell has spent over 30 years working with data the systems and people that process it. Formerly Director of Data Policy and Governance at the Higher Education Statistics Agency, and member of the DfE Information Standards Board, he now works with further and higher education providers as a strategic data advisor.