2019 marks the twenty-fifth anniversary of the first ever HESA submission and as we now stand on the brink of the new era of Data Futures, I thought it might be a good moment to think about how student data collections have evolved in the past 25 years and what the future might hold.
Don’t look back in anger
The first HESA collection was a massive upheaval for data operations in the sector and was delivered in what now looks like unfeasible timescales. HESA first opened its doors for business on 2 January 1993 with six staff. The first draft of the data specification was published towards the end of 1993 and the first submissions were made 12 months later. By comparison the Data Futures programme – then called ‘CACHED’ – first emerged in 2015 and we are still a long way from the first collection. This huge increase in timescales shows how, compared to 1994, HESA collections are now massively complex in terms of both data structures and the data collection and quality assurance processes.
The other big shift in Data Futures is the move to more timely data collections with three reference periods every year. This feels like a big shift for HE and I know that many institutions find this a daunting prospect but colleagues in the FE sector have been making monthly returns for many years and, even once the Data Futures model is fully established, HE is still going to seem a little bit pedestrian by comparison. The Data Futures model moves the sector closer to a transactional data collection and, while I can’t see a truly transactional collection happening in the foreseeable future (ever?) I can see future iterations of data collections shift to a more frequent pattern than the current plan.
Whatever you want
Of course the big driver in these data collections is the requirements of funders and regulators across the UK; so any conversation about future trends in data collection need to look beyond the technical issues and gaze deep into the depths of policy debates. There are any number of policy arcs that could emerge over the next few years that will impact on the types of data required. Changes to funding, whether it is the Augar review or the ones that inevitably follow, will shift the focus of priority areas within the data, whether it is qualifications on entry, socio-economic indicators or some other aspect of the data that is the current policy-priority.
Whatever future models of data collection emerge over the coming years, the need for high quality data is not going to diminish. That first HESA data collection in 1994 had a data quality regime that was made up of 250 simple validation rules across the 134 fields in the record…and nothing else. Since then data quality agenda has driven every aspect of the data collection process and the reputation of the national student dataset stands or falls by its quality. Data quality assurance now involves a massive range of validation and credibility checks and sees the HESA collection platform link up with the funders and regulators across the UK to generate a swathe of reports and checks on every file that providers submit. These include on-line checks against data held by awarding bodies and the SLC.
Data Futures will see a raft of new tools and methods for data quality assurance and providers will need to deliver data that meets ever-increasing quality standards in a far more timely fashion. This trend is not going to go away and, whatever happens to the details of the data specification and collection schedule, the future of data collection will see the expectations around data quality only increase. Providers that do not rise to this challenge are going to find the future a very uncomfortable place to be.
Andy Youell has spent over 30 years working with data the systems and people that process it. Formerly Director of Data Policy and Governance at the Higher Education Statistics Agency, and member of the DfE Information Standards Board, he now works with further and higher education providers as a strategic data advisor.