Algorithms have been in the news a lot recently, most notably with the problems around the award of grades for school and college qualifications in 2020. This has not been a happy episode and the increasing public awareness of algorithms has been tainted by some hysterical headlines in the media and some fatuous comments from politicians and commentators.
Algorithms are not new; for as long as we have been processing data we have been using algorithms. The dictionary tells us that an algorithm is a set of rules or instructions to be followed in a calculation or function – so every time we are processing some data we are executing an algorithm.
Like so many areas of data technology, the issues around algorithms often get wrapped up in a myriad of terms and jargon. But if we cut through this we see that algorithms are an integral part of data processing and they need management, oversight and assurance as much as the data itself.
The language of analysis has shifted from information to intelligence and to insight. The language of analysis has now hit a new peak with the emergence of the term ‘foresight’. This could be dismissed as pure hyperbole but to me it signifies a shift in our expectations about what data analysis can achieve. Unlike information, intelligence and insight, foresight is forward looking; The emphasis has shifted from understanding the world as it is to predicting what the world will be. Crystal balls and tea leaves have had their day – we have algorithms!
The problem with predicting the future is that the future is inherently unpredictable. We can analyse the past to understand the likelihood of future events and establish what type of events can be predicted with a high degree of confidence (eg the sun rising tomorrow) and those that cannot (eg next weeks lottery numbers). Algorithms can be tested against historical data to establish the extent to which their predictions match the pattern of previous actual outcomes. This testing can be iterative with individual refinements to the algorithm evaluated until the optimal algorithm is defined.
However much analysis and testing you do, predictive algorithms still face two major challenges. This first is the residual uncertainty that exists in predictions. Even if you have analysed the historical patterns and assessed and mitigated all of the risks that you can there will always remain an element of uncertainty when predicting the future. There will always be a gap between prediction and reality.
The second factor is perhaps more subtle. The analysis of historical trends and the testing and refinement of predictive algorithms can really only take place against large sets of data. In fact the larger the dataset, the greater the (statistical) confidence in the analysis and the algorithm. However, this confidence in the big-picture can mask all kinds of edge-cases and oddities down in the weeds. Confidence that an algorithm will predict the right outcome at a high level does not necessarily imply confidence that an algorithm will predict the right outcome for each individual.
Algorithms are based on summarised patterns which mask cases that deviate from the norm. Anything unusual or exceptional tends to get dismissed as ‘noise’ in the data; algorithms generate standardised outcomes in a non-standard world.
Predictive algorithms offer huge potential but they must be used with care and their limitations need to be fully understood. I think there are two key principles that need to be borne in mind when using algorithms in this way.
First is the approach to developing and testing algorithms. This needs to consider both the high-level outcomes and the extent to which the algorithm is capable of generating meaningful outcomes at the mico-level. In addition to tests against historic data, the effectiveness of an algorithm should be tested against the policy or business objectives that it is trying to meet. The risk of unintended consequences can be high and it needs to be fully understood and assessed.
The second issue is about how algorithms are used in decision making. We can think of this as being the difference between the algorithm informing the decision and the algorithm being the decision. If you apply for a bank loan then a credit score might be used by the bank as a part of their decision-making process. The credit score algorithm provides a valuable reference point but other factors may be taken into account in making the decision. If the output of the algorithm becomes the decision – as we saw with the 2020 qualifications grades – then the ability of the algorithm to predict the correct outcome at an individual level becomes absolutely critical.
Our increasing use of algorithms in decision making brings opportunities and challenges. The ability to automate decision-making in a complex and dynamic world is a tantalising goal but one that is fraught with complexity and risk.
Andy Youell has spent over 30 years working with data the systems and people that process it. Formerly Director of Data Policy and Governance at the Higher Education Statistics Agency (HESA) and member of the DfE Information Standards Board, Andy is now a strategic data advisor. A school governor, Andy was also recently a member of the Independent Review Panel on 2020 qualification grades in Wales.