This piece was republished with permission from the Bureau of Investigative Journalism. It was produced in partnership with the Pulitzer Center’s AI Accountability Network.
Six weeks before the US launched its brazen mission to capture Venezuelan president Nicolas Maduro, a spy plane was detected flying figure-of-eight loops above the Atlantic near the country’s border with Guyana.
The US Air Force had sent the Rivet Joint, one of the world’s most powerful surveillance aircraft, to gather intelligence before a hostile operation. The UK also owns three of the planes and last month dispatched one to the North Atlantic to support the seizure of the Marinera tanker, which the US said had transported sanctioned oil for Venezuela.
Equipped with cutting-edge military technology, the Rivet Joint can pick up radar signals, geolocate enemy systems and intercept communications from 150 miles away.
Exactly how it does this is kept strictly under wraps. But we can reveal that an Australian tech company called Appen has performed work for a secretive US military unit, code-named Big Safari, that installs the planes’ tech systems.
Appen recruits gig workers from all over the world to help train AI systems. The company’s latest annual report says it has a workforce of a million people who speak over 500 languages.
Many of these gig workers are paid very little. Some are from countries that have faced attacks from US armed forces. None were told by Appen that they may have been working for the US military.
In fact, Appen served various branches of the US military for well over a decade. We have seen defence contracts worth $17m awarded to the company by military agencies between 2005 and 2020. The contracts were for various linguistic projects, including $145,000 for the Rivet Joint work.
Appen’s gig workers were not given full details about the purpose of their work, as is standard industry practice. Where the datasets were sold to military agencies, their intended use remains unclear. The company did not respond to our requests for comment.
“They were secretive about the ultimate goal,” said Hassan*, who worked on Somali transcription and data collection projects for Appen. “They never share like that. They will only give us guidelines or instructions for conducting the tasks … but besides that, we didn’t know where this data was headed.”
Long before Google Translate, Siri or ChatGPT were household names, Appen was helping build the systems that allow us to speak to our computers.
Founded in Sydney in 1996, Appen became a market leader in selling data that would be used to help computers “learn” a language by recognising patterns in text or speech.
When an iPhone user asks Siri to set a reminder, the process seems entirely automated. In reality, this dialogue is only possible because companies like Appen have supplied Apple with material produced by countless hours of manual labour done by low-paid gig workers.
Appen’s workers, many of them based in poorer countries, helped the company build an expansive catalogue of text and speech datasets. These were then sold on to companies developing software that could process, translate and transcribe human language.

Oliver Kemp/TBIJ
These technologies have also long been of interest to the US government, which for decades has ploughed hundreds of millions of dollars into linguistic research programmes. (Apple’s voice assistant Siri in fact has its origins in a military research project.)
Contracts show that from 2005 onwards, Appen was involved in several military projects, mostly funded by the Air Force or the US Army’s contracting arm. Some were for research and development purposes, while others seem more closely related to aerial warfare.
Between 2015 and 2017, Appen worked on three subcontracts to provide language data files for a project called “tactical language interpreter”, valued at $287,500. One mentions the Rivet Joint spy plane.
As well as gathering information ahead of operations, as in Venezuela, Rivet Joints conduct long-term surveillance, explains Christoph Bergs, an airpower analyst at defence thinktank RUSI.
“Whatever you’re intercepting, you begin to understand how the people behind that think,” he says. “It can provide insights on very specific elements, but also more wider big-picture geopolitical analysis.”
The planes have formed a crucial part of the US arsenal for decades, flying their first missions as far back as the Vietnam War. In 2010, three were sold to the RAF and the UK works closely with the US on Rivet Joint missions. When we asked the Ministry of Defence about the Appen contracts, it declined to comment.
From 2005 onwards, Appen was involved in several military projects, mostly funded by the Air Force or the US Army’s contracting arm.
In recent years, they have been dispatched towards Russia’s western border, the Chinese coastline and Gaza, where an RAF Rivet Joint patrolled the eastern Mediterranean in the days following Hamas’ attack on Israel.
Using other kinds of aircraft, the RAF flew near-daily surveillance missions around Gaza until the ceasefire agreement, though the MoD says these missions were only related to hostage recovery and have now ended. How Israel used the intelligence gathered by RAF spy planes remains, like so much of the Rivet Joint programme, a mystery.
What we do know is that because of their age, Rivet Joints need regular updates to ensure they’re fit for modern warfare. Every few years, they’re gutted and revamped by a specialised and highly secretive Air Force unit known as “Big Safari”, which rapidly modifies surveillance aircraft.
Bergs likens this electromagnetic arms race to “a game of chess”.
“You make a move and then another actor makes another move,” he says. “You have to ensure that the capabilities … are up to date and able to not just intercept all this raw data, but then also store it and analyse it within a relatively rapid timeline, as part of assisting the human operators.”
This is the job of the Rivet Joints: to capture vast amounts of data to be decoded by its crew, which includes specialised military linguists, and analysts on the ground. Budget documents from 2015 show the Air Force wanted to upgrade the fleet’s capability to analyse audio data.
Beyond this, we know little about the software on board the planes. We don’t know if Appen’s data was used to process information gathered by the Rivet Joint rather than, for example, to test out a new capability or to train its crew.
The only other reference we could find to the mysterious “tactical language interpreter” system for which the Air Force bought Appen data was in a veterans’ newsletter, which describes an AI system designed to “streamline voice data processing”.
We sent freedom of information requests to several US military units about the nature of Appen’s work. Most did not respond or said no relevant records were found. One sent back 720 pages of entirely blacked-out redactions.
At Appen’s offices in Sydney and Washington, a culture of secrecy also surrounded the company’s defence work. We interviewed nine former Appen managers, all of whom said that the specifics of its military projects were closely guarded.
“I came in as someone who can do the harder languages, where you don’t have native speaker linguists … because there’s been a civil war for the past generation,” said Will*, who managed teams of Somali gig workers.
Somali is known in industry parlance as a “low-resource” language because there is not enough readily available data to train a computer model effectively. In these cases, companies like Appen are all the more important in building and selling bespoke datasets.
“I remember spelling standardisation being a lot of fun with Somali because they wrote in Arabic [script] until the 1970s, and there’s been wars … and you’ve got all the different dialects, and the only dictionary we could find was Italian to Somali,” said Will.
We have seen two military contracts held by Appen that specifically mention Somali. One involved the sale of a database of Somali telephone conversations and transcriptions used to build speech recognition systems. The other was an Air Force research project related to “speech and audio exploitation technologies for support of military applications”.
Both were categorised as “advanced research and development”, the stage at which military research is used to build prototypes and tested with a certain application in mind. (We have not seen any evidence to suggest that either of these contracts was related to the Rivet Joint.)
Recruiting and paying people in Somalia was considered too difficult due to international sanctions, according to Will and other Appen managers.
He said Appen looked to hire Somali speakers in Kenya, which as well as being a major gig-work hub is home to the one of the world’s largest Somali diasporas. The country’s Somali community totals almost 3 million, more than 300,000 of whom live in refugee camps having fled famine, drought and a civil war which began over three decades ago.
The US military has been actively engaged in conflict in Somalia since at least 2007 against terrorist groups such as Al-Shabaab. US forces have killed between 93 and 170 civilians in this period, according to conflict monitoring group Airwars.

Oliver Kemp/TBIJ
Ismail*, who left Somalia as a child, began working for Appen while living in Kakuma, a desert refugee camp in north-west Kenya so remote that its name means “nowhere” in Swahili.
He worked for Appen at various points between 2015 and 2018, on Somali transcription and translation projects. He can’t recall the exact rates, but said he felt the pay was good at the time.
Ismail said Appen was “booming” and a lot of his friends were also working on the platform. He said they often wondered about the purpose of the Somali transcription work, which involved listening to audio – sometimes from news reports or public meetings – and typing out exactly what they heard, including interruptions and background noise.
“The voices were quite hard,” he said. “Sometimes it was muffled, sometimes you had to listen to one sentence like three to four times to make it accurate, because if you don’t get it right, it comes back.”
Ismail did not know who the work was being done for. Even the name of the project manager was anonymised in the Appen platform. “When you work on something, it’s good to understand what it entails,” he said. “My friends and I, we used to ask each other, we are transcribing these things – what is it about?”
“What’s the intention of recording a meeting and telling us to transcribe? We could not come to a direct conclusion … at that time, I personally could not understand what it was about.”
Ismail did not know who the work was being done for. Even the name of the project manager was anonymised.
The content tended not to offer many clues. “It was not like a full story,” he said. “You open this recording, it’s two minutes, it’s about a topic, like someone in the market. The next one can be someone talking saying, ‘Today we’re having problems, and the security of the city is not doing well’ – so it was not something that was chronological.”
The opacity of gig work platforms like Appen mean we don’t know whether these specific transcription projects were related to defence contracts. But it’s possible that Ismail, having fled conflict in Somalia, was unknowingly helping the US military from a Kenyan refugee camp.
Even those managing teams of Somali gig workers, like Will, were given limited insight into the purpose of their projects. While he understood the technical aims, he didn’t always know why the client needed Appen data or how it would ultimately be used.
He was aware he had worked on projects for military agencies but in general his impression was that these were “useful and helpful … peaceful rather than being used for evil,” he said.
“Except for one instance where I was uncertain and sort of felt very like I was part of a war somewhere. And that was uncomfortable – that I couldn’t get further answers.”
Appen is just one company among many providing training data to the world’s most powerful tech companies. These data providers, though little known outside of the industry, play a crucial role in fuelling the AI boom. They employ millions of gig workers like Ismail, who know little about the systems they are building and are often paid poorly for their work.
In recent years, a number of new groups have sprung up to give these data workers a collective voice. The Data Labellers Association, founded in Kenya last year, says these workers, who it calls “the invisible architects shaping the future of technology”, also face precarious contracts, mental health challenges and limited growth opportunities.
Joan Kinyua, president of the Data Labellers Association, said a lack of transparency in the training data industry was another key issue. “I feel like it would be very important if [companies] just disclose information like who are we working for, what is the purpose of this,” she said.
“Because at times you might do a project and then you find you’re putting other people in danger, or it does not sit well with your morals or even with your culture.
“There’s some things you will do, and then once you find out, then you’re going to continue blaming yourself … it’s very important if there’s a bit of transparency over what you’re working on.”
For now, gig workers in Kenya remain in the dark about the projects they take on, building datasets which could be sold to top-secret military clients, a private sector tech company, or both.
