Developers often adopt generative AI (GenAI) because it helps them to code faster, yet the tooling has potential to expose organisations to unaccounted for risks – especially if use is unauthorised or best practice ignored.
“With GenAI, we see both amazing results and stunningly stupid results for the same dev team, and that tells us that we have work to do on the process and tooling side,” says David Colwell, vice-president of AI and machine learning (ML) at test automation supplier, Tricentis. “My personal view is that AI can be the Dunning-Kruger effect incarnate.”
Dunning-Kruger effect is a natural cognitive bias; the less skill or knowledge you have about a given topic, the more likely you are to overestimate competence in that area.
Some team members with an average amount of skill might review a piece of code and think it looks fine. True experts, on the other hand, may look at the same code and see a build full of security vulnerabilities, bad packages and other issues, Colwell explains.
How you deal with that and avoid the risk of multiple new types of errors that you haven’t seen before can be challenging. Tooling is available, but first you need robust security policy, strong and enforced practices and processes that ensure governance. And because GenAI can create code faster, organisations may struggle to test enough to keep up with the rate of code production.
In a 2025 survey by Tricentis, around 63% of 2,700 leaders – mostly executives, managers, and IT professionals – polled admitted shipping untested code, and 90% indicated trusting GenAI to make software release decisions.
Defences against superficially impressive results
Less knowledgeable team members can be asking GenAI tools to build an app for a given task. Resulting code can be superficially impressive if you have no idea about the issues it might contain.
For one thing, software development, AI based or not, must be secure by process. If you commit code, it must pass security scans, validation checks, dynamic scans and the rest. However, you cannot completely eliminate mistakes in code – “code that’s got stupidity in it”, Colwell confirms.
One example of this might be if a user of an age-restricted application or website is under 18 but the code fails to deny access at specific entry points, or if the user accidentally clicks the wrong button or otherwise offers an incorrect response. Those kinds of simple errors must be checked for every time because they are frequent. All code must pass review and validation processes, however created. It all needs oversight.
Of course, thorough documentation of what teams are doing is crucial. And to some extent, AI-powered testing, network monitoring and backlog management tools can help to detect code problems and prioritise changes according to risk.
A McKinsey study suggests that using surveys, existing data and backlog management tools can reduce customer-reported software defects by 20-30%. App discovery software to detect AI usage and data loss prevention (DLP) tools to pinpoint inappropriate information sharing can also prove valuable.
Code coverage analysis tools can trace which parts of code are executed during functional tests. They might identify bits of code not executed during a test, suggesting unneeded or erroneous code that AI introduced. Also, a relevant tool can identify redundant or irrelevant conditions to documented requirements. AIs can sometimes add strange things to code make a test “pass” or satisfy specific situations.
But above all, Colwell notes, defending organisations against risks introduced into code by unauthorised or improper GenAI use means investing in training and education. Organisations would be advised to take note. If you know developer teams are educated well in best practice and the risks if they get it wrong, you can have trust in their policies, documentation and practices.
“Teach your engineers and the people using GenAI the limitations of the specific tooling they have,” Colwell says. “A lot of people will think of GenAI as more or less a magic thinking box, but what you actually have is a natural-language problem-solving box with a short memory, a tendency to answer rapidly rather than find concrete facts, no access to its external environment and amnesia beyond the last point of training.”
Indeed, you may not be able to buy it in. Ankur Anand, CIO of Nash Squared, which owns Harvey Nash IT recruitment, says the AI skills shortage is the biggest in tech for 15 years.
AI skills include understanding how to leverage the platforms and CRM, learning around prompts and “the responsibility that comes with that”, including auditing the results prior to use. GenAI skills are in demand for developers, product managers and project managers as well as data quality, data lineage and data governance skillsets.
Below that, Nash Squared’s May report found increased demand for Python developers with large language model (LLM) knowledge, for example. GenAI has become a “nice to have” in many other job descriptions too.
“This places increasing demands on the tech team as it’s not about just one area,” Anand adds.
Facundo Giuliani, solutions engineer at CMS supplier Storyblok, broadly agrees: “The code generated by GenAI can be a good starting point – supervised by a human who knows what it’s doing and what’s happening in the background.”
Developer skills will remain crucial unless events overtake our future faster than we expect. Meanwhile, it remains imperative to control development processes, especially when multiple teams are involved.
Additional approaches to quality AI coding practice
Giuliani notes that code generated by AI models trained on the public internet are often based on datasets that are not anything like a source of truth. Clues that something is missing or is not right in the code, or simply that copy-and-paste has been deployed or overused, might include long-way-around or tangential solutions.
Are there more bugs than you would expect, or are things happening too fast or slow? Pay attention to productivity metrics, such as DevOps Research and Assessment (DORA) and Space/wellbeing, Activity, Performance, Communication, Efficiency/flow (SPACE) metrics, contribution analysis and talent capability scores.
Formal AI governance and AI model risk management (MRM) is needed. There are also evolving frameworks and standards to help assess AI risk.
International Standards Organisation (ISO) standard 42,001 is about managing AI responsibly, and the US’s National Institute of Standards and Technology (NIST) AI Risk Management Framework (and playbook) are in development.
Giuliani says any patterns out of the ordinary require closer inspection for poor coding practices. “You might see over-complex solutions for simple problems. The same happens with code created by people with no experience. A companion or a mentor should help them elevate their knowledge. A person must become somehow responsible for the code before production,” he says.
Checking code adequately means deploying various techniques or processes, including ensuring a colleague or supervisor does manual code checks before submission to production environments. That’s regardless of how code was created, Giuliani adds.
Jody Bailey, chief product and technology officer (CPTO) at developer community Stack Overflow, broadly agrees: “You need to ensure that what is being put out is still quality. You need oversight and reviews. A lot of folks are using prompts to write their code but even then evaluating those prompts.”
But the challenge for developers has never really been about how fast you type and how fast you write the code. It is more about whether you have the right ideas and are thinking about problems logically and efficiently, Bailey says. He agrees that validating AI might involve using AI. One approach might use Anthropic versus Gemini, for example, because different models have different strengths and weaknesses.
“On the various leaderboards, this can change from month to month. Some are more code-focused, others more general purpose,” he says.
Although you may never completely eliminate the use of shadow IT, more general monitoring can provide assistance, including with tooling for web interactions and endpoint management. But if people introduce something on their own and the results are good, the organisation may well run with that.
“I can’t help but think of sports, where somebody takes a shot and the coach goes ‘No, no, not like that!’, and then the goal goes in and it’s ‘Yay!’ [instead],” says Bailey.
The approaches chosen will depend on circumstances and need, but code must have oversight and quality controls whether GenAI is used or not. The alternative, Bailey adds, is having a very locked down environment where the risk is loss of agility and innovation.