Final week, our lead software program engineer, Nelson Masuki and I offered on the MSRA Annual Convention to a room stuffed with sensible researchers, knowledge scientists, and improvement practitioners from throughout Kenya and Africa. We had been there to handle a quietly rising dilemma in our subject: the rise of artificial knowledge and its implications for the way forward for analysis, significantly within the areas we serve.
Our presentation was anchored in findings from our whitepaper that in contrast outcomes from a conventional CATI survey knowledge with artificial outputs generated utilizing a number of massive language fashions (LLMs). The session was a mixture of curiosity, concern, and significant considering, particularly after we demonstrated how off-the-mark artificial knowledge will be in locations the place cultural context, language, or floor realities are advanced and quickly altering.
We began the presentation by asking everybody to immediate their favorite AI app with some precise inquiries to mannequin survey outcomes. No two folks within the corridor received the identical solutions. Regardless that the immediate was precisely the identical, and many individuals used the identical apps on the identical fashions, subject one.
The experiment
We then offered the findings from our experiments. Beginning with a CATI survey of over 1,000 respondents in Kenya, we carried out a 25-minute research on a number of areas: meals consumption, media and expertise use, information and attitudes towards AI, and views on humanitarian help. We then took the respondents’ demographic info (age, gender, rural-urban setting, schooling degree, and ADM1 location) and created artificial knowledge respondents (SDRs) that precisely matched these respondents, and administered the identical questionnaire throughout a number of LLMs and fashions (even did repeat cycles with newer, extra superior fashions). The variations had been as diversified as they had been skewed – nearly all the time flawed. Artificial knowledge failed the one true take a look at of accuracy – the genuine voice of the folks.
Many within the room had confronted the identical stress: international funding cuts, rising calls for for pace, and now, the attract of AI-generated insights that promise “simply pretty much as good” with out ever leaving a desk. However for these of us grounded within the realities of Africa, Asia, and Latin America, the thought of simulating the reality, of changing actual folks with probabilistic patterns, doesn’t sit proper.
This dialog, and others we had all through the convention, affirmed a rising reality – AI will undoubtedly form the way forward for analysis, however it should not exchange actual human enter. A minimum of not but, and never within the elements of the world the place reality on the bottom doesn’t reside in neatly labeled datasets. We can not mannequin what we’ve by no means measured.
Why Artificial Knowledge Can’t Exchange Actuality – But
Artificial knowledge is strictly what it appears like: knowledge that hasn’t been collected from actual folks, however generated algorithmically based mostly on what fashions assume the solutions ought to be. Within the analysis world, this sometimes includes creating simulated survey responses based mostly on patterns recognized from historic knowledge, statistical fashions, or massive language fashions (LLMs). Whereas artificial knowledge can function a practical testing instrument, and we’re frequently testing its utility in managed experiments, it nonetheless falls quick in a number of vital areas: it lacks floor reality, it missed nuance and context, and subsequently it’s laborious to belief.
And that’s exactly the issue.
In our side-by-side comparability of actual survey responses and artificial responses generated through LLMs, the variations weren’t refined – they had been foundational. The fashions guessed flawed on main indicators like unemployment ranges, digital platform utilization, and even easy family demographics.
I don’t consider that is only a statistical subject. It’s a context subject. In areas akin to Africa, Asia, and Latin America, floor realities change quickly. Behaviors, opinions, and entry to companies are extremely native and deeply tied to tradition, infrastructure, and lived expertise. These will not be issues a language mannequin skilled predominantly on Western web content material can intuit.
Artificial knowledge can, certainly, be used
Artificial knowledge isn’t inherently dangerous. Lest you assume we’re anti-tech (which we will by no means be accused of), at GeoPoll, we do use artificial knowledge, simply not as a substitute of actual analysis. We use it to check survey logic and optimize scripts earlier than fieldwork, simulate potential outcomes and spot logical contradictions in surveys, and experiment with framing by operating parallel simulations earlier than knowledge assortment.
And sure, we may generate artificial datasets from scratch. With greater than 50 million accomplished surveys throughout rising markets, our dataset is arguably one of the consultant foundations for localized modeling.
Nevertheless, we’ve additionally examined its limits, and the findings are clear: artificial knowledge can not exchange actual, human-sourced insights in low-data environments. We don’t consider it’s moral or correct to exchange fieldwork with simulations, particularly when selections about coverage, funding, or assist are at stake. Artificial knowledge has its place. However in our view, it’s not, and shouldn’t be, a shortcut for understanding actual folks in underrepresented areas. It’s a instrument to increase analysis, not a substitute for it.
Knowledge Fairness Begins with Inclusion – GeoPoll AI Knowledge Streams
There’s a big motive this issues. Whereas some are racing to construct the following massive language mannequin (LLM), few are asking: What knowledge are these fashions skilled on? And who will get represented in these datasets?
GeoPoll is on this house, too. We now work with tech firms and analysis establishments to offer high-quality, consented knowledge from underrepresented languages and areas, knowledge used to coach and fine-tune LLMs. GeoPoll AI Knowledge Streams is designed to fill the gaps the place international datasets fall quick – to assist construct extra inclusive, consultant, and correct LLMs that perceive the contexts they search to serve.
As a result of if AI goes to be actually international, it must study from the complete globe, not simply guess. We should be certain that the voices of actual folks, particularly in rising markets, form each selections and the applied sciences of tomorrow.
Contact us to study extra about GeoPoll AI Knowledge Streams and the way we use AI to energy analysis.










