The Gap

For a number of years the team I worked in was (amongst many other things) an escalation point for the software support team of our platform. For a shorter period, when they were short staffed, we took on a more direct support role, and I still maintain that I learned more about that system in those months than at any other time. I still believe that if you want to understand a software system and how it really works, you should spend some time doing support. This is different from the notion of ‘if you build it, you support it,’ which is a way of optimising engineering knowledge to best serve the maintenance of the software, but can also have the effect of siloing or limiting contextual knowledge. Getting someone to spend time on frontline support is about expanding their knowledge of the whole system and how it is actually used, rather than how the bit they built should be used (before it’s been mangled by the first contact of UAT).

Anyway the thing about support of a system like that, a 25 year mature system serving hundreds of business clients with millions of end users, hundreds of millions in monthly transactions and billions in assets was that it was complicated. Everything was complicated. Tracing through an issue to its cause was not necessarily straightforward, and whilst we had a comprehensive knowledge base of known issues and remedies, unknown scenarios would crop up all the time. Sometimes these were the result of an edge case user action (someone did a bunch of things in an odd order that no one would ever imagine a person would consider, but hey, people be people), sometimes they were systemic or configuration issues that could be fixed at a client or platform level. In the few months that I did direct support, across maybe 300 issues, I think about 90 had an obviously known cause that I could find in our knowledge base, maybe another 120 or so that could be resolved by looking at known issues, the data and the system output and extrapolating a fix. That leaves another 90 maybe that required deeper analysis, modelling in a test system, wider consultation and/or full debugging. These were effectively previously unknown issues with a non-obvious cause. There was no reference data for them and often determining the relationship between cause and effect required a non-linear association with other processes for other clients. These were the cases that always got escalated to us anyway (and often on to the senior engineers).

I was talking to a friend the other day, who pointed out that the pattern of task replacement of (generative) AI in white collar workplaces will be bottom up. The first tasks to be automated will be those of the intern or other entry level positions, the apprentice, graduate trainee or whatever. If the technology can prove itself (and, some argue, even if it can’t) then it will gradually move up organisational hierarchies, automating ever more complex tasks until… what? Really there are two potential outcomes. First if you believe the hype, AI will gradually ‘learn’ all the tasks, replace all the jobs and we will all be forced to live a life of leisure whilst the machines do all the work. This vision is entirely wishy washy and glosses over many practicalities, not least where the money will come from to allow us all to quit work (the answer is nowhere). The reason for this is that broader ‘strategies’ around AI are not concentrated on the kinds of things that the rest of us are required to focus on: clear goals with measurable outcomes. Instead we get vague promises of things like AGI, which will somehow solve everything.

When I was a teenager, my band went to a conference at Salford University, where one of the highlights of the day was a speech by a real live A&R man. He kinda killed the vibe by telling a room full of musicians that only maybe one or two of them would get a record contract. He was wrong about that, because three members of my band got at least five record contracts between them. Hah! But what he wasn’t wrong about was that getting a record contract was just the start of the real work. In my teenage brain, a record contract was not a means, but an end in itself, it was the goal, and what would follow would just be happy ever after. I can’t help but think the concept of AGI is just the technology equivalent of my teenage record contract: a convenient happy ever after that we tell ourselves to avoid the fact that there is no easy way out. Quite apart from the fact that generative AI is as close to super intelligence as a house brick is to a pat of butter, the idea that an AGI will suddenly be able to solve all our most intractable problems (cure cancer, fix the climate crisis, solve poverty, come up with the perfect wedding seating plan) is laughable. Even if we suppose that an AGI could come up with the answers to all our problems, it isn’t omnipotent, it can’t just enact its will entirely unmitigated. The climate crisis already has a solution. If 8 billion people can’t put it into effect, how is one super smart computer going to? The reality of course is that we are nowhere near AGI. If you think about the amount of power, energy and compute it takes to make bad art or write boilerplate copy, and then think about how much those things have improved in the last two years. Does the improvement you have seen in the last two years equate to a logical leap to super intelligence in the next two? In AI images from two years ago, people had six or three, or seven fingers, now they usually have five. By comparison children usually learn to draw the right number of fingers just before they finish erm, year 2?

The problem is that most people (very much including those doing the coding) can’t differentiate between the fact that what is being done with LLMs is very clever (i.e. it is very clever to be able to make a computer do what you made a computer do), and whether the LLM itself is clever (not so much). It is a universal fact of computing that the amount of intelligence that goes into creating any programme is 1000 x the amount of intelligence displayed by the programme itself. With AI, programmers have attempted to embed more of the intelligence into their programmes by stuffing them with data as a substitute for knowledge and making them actually lots of little programmes that run together. Unfortunately due to the amount of data that is referenced and processes that are run, no one knows exactly what an LLM or diffusion model is doing when it is generating text or images. Obviously a data scientist can explain the overall theory and process, and can ‘train’ the algorithm to give certain types of output based on certain types of input, but they couldn’t tell you the exact words an LLM will write if you ask for a 100 word review of Pride and Prejudice in the style of Hunter S Thompson or draw the exact picture you would get if you asked a diffusion model for a picture of Donald Duck flicking a bogie at Donald Trump. Because of this unpredictability, even the people who know what the model is doing are inclined to ascribe intelligence to it. This is human nature, we want to see meaning, purpose and agency in everything. After Joseph Wisenbaum created ELIZA in 1966, he was disturbed by how quick everyone was to ascribe intelligence to what was actually a fairly simple computer programme. Some say the last straw was when his secretary asked him to leave the room when she was ‘talking to’ ELIZA. Wisenbaum stopped his work on ELIZA in 1967 and never made another chatbot. In fact he became a staunch critic of AI and computers in general, seeing them as ways of reinforcing and automating fundamentally floored and ‘conservative’ systems. Regardless of what you might think of his latter work, Wisenbaum proved that we do have a strong drive towards assigning human attributes to machines even when they provably don’t exist. Intelligence is obviously the key human attribute we want to project on machines because what does it say about us if we can create intelligence in another? So coupled with the natural human drive for anthropomorphism, you have the inherent self flattery of believing that you are the person/team/generation that can create something beyond human intelligence, that you can in effect create god. These factors seemingly combine to prevent many very intelligent people from seeing the yawning chasm between a vast network of compute intensive algorithms that can produce passable business documents and even basic human intelligence. All of which is a very long way of saying AGI or machine super intelligence is a mirage, a distraction from the very real considerations that we should be taking into account around the proliferation of generative AI that we have today.

This leads us to the second (more plausible) scenario, which is where generative AI gets incrementally better over the next few years and gradually automates away tasks and roles, starting with the ‘basic’ tasks and entry level roles and working up. If we see this as a simple bottom up, role based process, there is a real sense of AI pulling up the ladder behind it. If the roles that people learn their ‘craft’ in are automated away, you create a skills gap and there is no one to fill the support escalation point role from my example when the person currently doing that role gets promoted, leaves or retires. The obvious answer is that AI will move in to fill that gap, but this is where the problem occurs: the gap is not a data gap, it is a knowledge gap. With Large Language Models, we often confuse data and an ability to reference data with knowledge, but there is an important difference: knowledge in a human allows the imagination to create an entirely new thing, data in an AI only allows it to create new variations of an existing thing. As I said before there is a percentage, 30% or whatever it may be, of that senior platform support role that requires a degree of creative, imaginative thought that simply doesn’t exist in a redoing of all the data that already exists. What happens when we’ve failed to give the humans the knowledge to do that thinking? There is nothing to suggest that AI can do it for us.

But let’s suppose it can, let’s take a leap of faith, against rationality, and assume that AI will be able to perform creative thought type processes that do not directly reference existing data*. What are the costs? I don’t mean the energy costs, the compute costs or the environmental costs, but simply the pure business costs. Currently if you’re using generative AI to carry out some of your easier workflows or create some of your basic documents it’s probably relatively cheap. That’s not the actual cost, that’s what the AI companies charge you in order to get you hooked on their product. In the scenario that generative AI does become capable of more complex tasks, that is not just going to fall like manna from heaven, it’s going to arrive as an upgrade with a commensurate price. Advancements in generative AI usually come with an increase in compute, which means more data centres using more chips. Neither of those things are going down in price and at some point the people who have poured hundreds of billions of dollars into this technology are going to want to see some return on their investment. In this scenario companies will undoubtedly end up paying more for the AI to do the work than they would have paid for people to do it, but hey, it’s too late because they didn’t train up those people, so they have no choice.

The more likely scenario is that generative AI has an ability limit, and that limit is contextual decision making. With the anthropomorphic language of AI, it is easy to forget that an LLM doesn’t ‘think’ or ‘decide’ or even ‘hallucinate’, it simply derives the most probable right** answer based on the information available. If that information is only tenuously linked to the question, the answer will only be tenuously linked to ‘right’. That is not a hallucination, it is simply the fact that the model is designed to arrive at the most probable answer regardless of how low that probability is as an absolute measure. If the most probable answer has a 5% chance of being right, and every other answer has a 4% chance of being right, the 5% answer is ‘right’. Again the current answer to this problem is to use bigger data sets so that more things can be closer to ‘right’ but that has the unfortunate side effect of making it harder to find the ‘wrong’ answers, or more accurately the areas where the data is less close to ‘right’. ‘Training’ as a process where the data associations are skewed towards the ‘right’ answer will work for common problems or known scenarios where enough of the right data exists to associate with, but where there isn’t enough data, or any data at all, the model cannot take information that it has ‘learned’ elsewhere and apply it in a different context to get an entirely new answer. There is no maths that can do that.

So the AI will grind away at the easy stuff, removing human opportunities to learn, gradually eroding the knowledge base from the bottom up, so that when the current crop of experts who have augmented their work with AI leave or retire, that knowledge will be lost. And that sort of knowledge cannot be stored, it is not data or data relationships or vectors. For some time now, humans have not needed to store information in their heads. It has been at least 15 years since software engineers stopped needing to ‘learn’ a programming language in order to code it. Sites like StackOverflow allowed you to find the answer to pretty much anything and if you had forgotten something basic (or didn’t know the exact syntax for a particular language), you could look it up somewhere like W3C. Obviously, the more you code a particular language, the more you remember how to do specific things, but that kind of ‘basic memory’ knowledge is non-essential, the internet and digital storage basically replaced memory shortly after the turn of the millennium. It’s been possible to ‘vibe code’ for years, it just used to take longer.

Unfortunately the way we measure knowledge in people has failed to keep up (and has in many ways gone backwards). We still mainly test people’s knowledge (and by association their intelligence) by testing their memory of facts. We might dress it up as essay questions but primarily we are testing someone’s capacity for recall as a way of assessing their ability to deal with a world that requires them to have no capacity for recall. I have railed against this before, but in the UK at least, the back to basics type school reforms of the 2010s were a step backwards in terms of actual useful education, preparing people to be part of a workforce of a bygone era, teaching them skills they will never need. That’s not to say it’s not easier if you know your times tables, but it’s not essential when you carry a calculator with you all the time. Unfortunately, assessing this type of recall is much easier to quantify than any other type of knowledge, it is easy to show improvement or failure. It is reductive and dehumanising, but then it is the product of industrialisation, so that’s not really surprising. The kind of Victorian basic education of rote learning from which our mainstream education system springs was designed to prepare people to work in factories and gradually modified to prepare them to do the kinds of jobs now largely rendered obsolete by technology. We hold on to it because it is easily measured, easily categorised, easily administered and entirely removed from everyday life experience, not to mention that people who only ever learn facts are easier to control than people who learn to think for themselves and evaluate information. This is not a criticism of educators, who know that their job is to apply context to the world, but of the policymakers who merely want to have numbers to talk about and a compliant workforce.

In my youth it seemed that every year or two there was a local state school student who was going to (and inevitably did) get top grades at A-Level but yet were scandalously denied a place at Oxbridge. This was always framed as discrimination, which in a way it was, but not in the way that the local paper inferred. The problem was that (in those days at least) Oxford and Cambridge were looking for people who could do more than just pass exams. I don’t know if this is still the case, in the era of high student fees, it feels like undergraduate education is entirely transactional, people pay for an education and expect to get it, presumably including quantifiable proof that they did it ‘right’. It is, of course, this kind of quantitative testing that is also deemed a measure of ‘intelligence’ for AI, with the benchmarks that are used to assess these models all being based on exactly the types of exams that prove nothing about human knowledge or ability. Unfortunately this means that you’ll get AI that is good at passing the tests, but probably wouldn’t get into a decent university. This is a long way of saying we’re educating the machines as badly as we’re educating the kids, except with kids it’s possible to teach them another way, with AI not so much. Generative AI is built to ‘learn’ this way and the chances of it doing anything else anytime soon are minimal.

So we have the potential to automate ourselves into a bit of a knowledge crisis if we’re not careful. In technology we are becoming adept at believing our own hype, getting really excited about the next big thing without thinking too hard about what the next big thing actually does or what its actual long term impacts might be. Despite the claim that the really smart people are doing the thinking about technology, it’s clear that most of them think about technology strategy in the same way as everyone else thinks about their work: what works for me right now? I was chatting to an engineering manager (not from my company) about some of this stuff the other day and he basically said AI is a bubble, it’s going to burst and I don’t know what will be left afterwards, but hey, have you tried building a virtual engineering team out of LLM agents? To me that’s not a strong indicator of strategic thinking, but then maybe I’m being unfair, his job is to get his team to deliver the code as quickly and as efficiently as possible. Nothing else. There are principles about things like clarity, reusability, interoperability and maintainability, but these are just principles that can be aimed for, held in mind or cast by the wayside due to commercial pressures. Really it falls to people like myself, people in the solutions and business architecture space, to think about these things in anything like a strategic manner. Approaches like TOGAF are very clear on the need to define frameworks and principles for implementing software change, but they are not clear on what those principles should be. We address the how in great detail, but leave the what and why up to business practice or business culture. We might need to rethink this a bit. Obviously there is personal pride in applying principles to an architecture framework that hopefully lead to good practice and good software design, but this is something more than that, this requires an architecture of approach and practice. What are the human inputs, outputs and dependencies of a system? Not just in terms of users but in terns of maintenance and sustainability most importantly. I know that in the age of automation we like to pretend that humans aren’t involved in algorithms at all, but generative AI itself gives the lie to that idea. In the immediate term, automation via AI may yield efficiencies (although studies tend to suggest otherwise), but what is the cost in the medium term? Research already shows that even short periods of consistent use of LLM based AI atrophies critical thinking in users, so mandating use of these tools should be carefully considered by businesses. By mandating use of generative AI, a company is risking dumbing down their own workforce with very little evidence that AI will be able to completely replace them any time soon, meaning most businesses would be deliberately degrading their biggest assets. On top of this, there must be the very real risk of class action lawsuits in future. If an employee chooses to use AI to do their job and it impairs their critical reasoning, that’s akin to them choosing to smoke on their break and getting lung cancer: it’s on them. If their employer mandates the use of AI, that could be seen as the cognitive equivalent of handing their employees a pack of cigarettes and telling them smoking at their desk is compulsory.

Regardless of the potential impact on individual employees, a human architecture needs to take account of the total impact of technology decisions and cannot be solely an HR function any more. The planning of business and technology architecture needs to take much more detailed account of human architecture and vice versa. Currently I suspect too many companies are leaving their strategic approach to AI and human architecture in the hands of generative AI’s most vociferous promoters: the ones who have most to gain from the belief that AI is months away from solving all our problems: the people selling it.

Of course there is a (very small) chance that they are right, and that superintelligent AGI is about to replace us all. But if that is the case, why would I buy your janky AI agent now? I might as well save the money for my early retirement.

*I don’t think I can stress enough what a total stretch beyond logic this is.

**’right’ in this case meaning an aggregate of what all the people who interacted with all the data had a positive impression of, rather than some concept of objective truth.

Leave a Reply Cancel reply