Why LLMs Might Be the Messiest Panel You’ve Ever Used
- egonzalez267
- 4 days ago
- 2 min read
Written by, Kevin Karty, Founder and CEO of Intuify
I'm not sure what's more interesting - which domains LLMs cite the most, or how fast they can change.
One of the key reasons advocates like Simulated Respondents was that it had lower risk of bias or fraud than online panels. So, about that... First, check out the data sources for key LLM models. Consider Reddit, for example. I personally love Reddit, but it's not exactly the "average" person. Something like 99% of the content on Reddit is generated by <2% of US citizens who are (most certainly) not the "average" person.
On top of this, 15% to as high 80% of posts on Reddit are bots - especially on posts intended to sway opinions (political, economic, brand-focused, etc.). This percentage has been increasing steadily over time.
(If you think that's bad, current estimates suggest that Amazon reviews - another key LLM source - are 42% fraudulent.) And unlike real panels, you can't "clean out" simulated respondents or validate them. All of that data is mashed together to create simulated opinions.
On top of that, validated models can change rapidly and without warning. In September, Google changed one parameter in its SEO/search results - limiting result access to the top 20 (which are often dominated by paid ads) vs. the top 100 results. That change is partly responsible for a sharp difference in citation sourcing from popular LLMs, which rely on RAG (Retrieval-Augmented Generation) to enhance results.
Google is acting to protect itself from the threat of search-loss to LLMs by limiting the information flow to LLMs... Meanwhile, Google has been actively shifting the prioritization of information in LLMs to prioritize Google owned sources (Youtube, for example).
Likewise, sources like brand-owned properties are increasingly being given greater weight, which means that LLMs tend to "answer" based on information they get from branded websites. That's the literal definition of an echo chamber.
Anyway, among primary market research, we do a lot of hand wringing over panel quality - we spend a lot of time cleaning data. I'll say this, at least we CAN clean the data. 🤔




Comments