How AI Models Use Glassdoor Reviews (And Why That's a Problem)
How AI Models Use Glassdoor Reviews (And Why That's a Problem)
When a candidate asks ChatGPT "What's it like to work at [Company]?", the answer that comes back sounds authoritative, balanced, and current. It isn't. It's a synthesis of Glassdoor reviews, many of them years old, blended with fragments from news articles and other web sources, presented without timestamps, caveats, or source attribution.
This is how your employer brand gets rewritten by AI — not through malice, but through the mechanics of how large language models process and reproduce information.
How AI Training Data Works
Large language models (LLMs) are trained on massive datasets of text scraped from the internet. Training data is the raw material that teaches AI models how language works and what facts exist. For employer-related queries, the most relevant training data comes from:
-
Review platforms — Glassdoor, Indeed, Blind, and similar sites contribute millions of employer reviews. These are text-heavy, opinion-rich, and cover thousands of companies in detail.
-
News and media — Articles about company culture, layoffs, growth, and scandals feed into AI's understanding of employers.
-
Company websites — Careers pages, about pages, and blog posts — but only if they're in formats AI crawlers can access.
-
Social media — LinkedIn posts, Reddit threads, and forum discussions about workplace experiences.
-
Job boards — Job listing text, including role descriptions and stated benefits.
The critical issue is weighting. Review platforms contribute a disproportionate volume of employer-specific text compared to other sources. A single company might have 500 Glassdoor reviews (totalling 50,000+ words) but only 2,000 words of careers page content. In volume terms, the reviews dominate — and volume influences how AI characterises your company.
The Review Data Pipeline
Understanding how review data flows from Glassdoor into AI responses requires tracing the pipeline:
Stage 1: Web Crawling
AI companies (OpenAI, Anthropic, Google) use web crawlers to scrape publicly available text. Glassdoor reviews are public HTML text — they're trivially easy to crawl and extract. The crawlers don't distinguish between a review from 2021 and one from 2026; both are ingested as training data.
Stage 2: Training Data Inclusion
The crawled text becomes part of the training dataset. During training, the model learns patterns and associations. If 200 reviews mention "long hours" at Company X, the model learns a strong association between Company X and long working hours — regardless of whether those reviews reflect current reality.
Stage 3: Query Response Generation
When a candidate asks about Company X, the model generates a response based on the patterns it learned during training. The 200 reviews about long hours translate into a confident statement like "Employees at Company X frequently report long working hours, though some note that work-life balance has improved in recent years."
Notice what happened: hundreds of individual opinions, spanning years, got compressed into a single authoritative-sounding paragraph. The timestamps vanished. The context vanished. The nuance vanished.
Stage 4: Reinforcement Through Repetition
Each time a user accepts this response (doesn't push back or ask for sources), the model's confidence in this characterisation increases. The review-based narrative becomes self-reinforcing.
The Recency Problem
Recency bias in AI review data is the tendency of AI models to treat all historical review data as equally current, regardless of when it was written.
This is perhaps the most damaging aspect of how AI uses review data. Consider a realistic scenario:
Company Y had a toxic culture under a previous CEO (2019–2022). During this period, they accumulated 150 negative Glassdoor reviews describing poor management, excessive hours, and high turnover. In 2023, a new CEO transformed the culture. Reviews from 2023–2026 are overwhelmingly positive (average 4.2 stars vs 2.8 previously).
What does AI say about Company Y in 2026? It says something like: "Working at Company Y receives mixed reviews. While some employees praise recent improvements in management and culture, others report concerns about long hours and high turnover."
The "mixed" characterisation treats 150 historical negative reviews as equivalent to 80 recent positive ones. In AI's training data, there is no timestamp that flags the older reviews as less relevant. The result: Company Y's AI employer brand is still defined by a culture that no longer exists.
OpenRole's analysis of the outdated reviews problem found this pattern across hundreds of UK employers. Companies that underwent significant positive change in the last 3–5 years are disproportionately harmed by how AI uses historical review data.
The Verification Gap
The verification gap is the absence of fact-checking between what review data claims and what AI presents as established fact.
When a Glassdoor reviewer writes "The salary for senior engineers is around £65,000", this is one person's claim based on their experience. When AI absorbs this into its training data and later tells a candidate "Senior engineers at [Company] typically earn around £65,000", it's presenting an unverified individual claim as a factual salary range.
The verification gap manifests in several ways:
| Review claim type | Verification by AI | Risk to employer |
|---|---|---|
| Salary figures | ❌ Not verified | AI states incorrect salary ranges |
| Benefits claims | ❌ Not verified | AI omits or fabricates benefits |
| Culture descriptions | ❌ Not verified | AI perpetuates outdated characterisations |
| Management quality | ❌ Not verified | AI attributes views to "employees" broadly |
| Growth claims | ❌ Not verified | AI states unverified career progression data |
| Policy statements | ❌ Not verified | AI states outdated remote/hybrid policies |
Research from OpenRole confirms that AI salary accuracy is just 38% for employers that don't publish salary data — and review-derived salary figures are the primary source of these inaccuracies.
Evidence: Review Data in AI Responses
To understand the scale of this problem, OpenRole analysed AI responses about 100 UK employers across ChatGPT, Claude, and Perplexity. The findings were stark:
Linguistic Fingerprints
AI responses frequently contain phrases that are characteristic of review platform language rather than corporate communications:
- "Pros and cons" framing (directly mirrors Glassdoor's review structure)
- "Some employees report..." (aggregating individual reviews into collective claims)
- "Work-life balance is a concern for some" (the classic Glassdoor complaint, reproduced verbatim)
- "Management can be hit or miss" (review platform phrasing, not corporate language)
Source Attribution Analysis
When asked to cite sources, AI models frequently reference Glassdoor directly. In OpenRole's analysis:
- ChatGPT cited Glassdoor as a source in 67% of employer-related responses when pressed for references
- Perplexity linked to Glassdoor reviews in 78% of employer queries (Perplexity always shows sources)
- Claude referenced "employee reviews" or "review platforms" in 54% of employer culture descriptions
Sentiment Correlation
OpenRole compared AI sentiment about employers with their Glassdoor ratings. The correlation coefficient was 0.82 — extremely high. This confirms that Glassdoor review sentiment is the dominant driver of AI's characterisation of employer culture.
By contrast, the correlation between AI sentiment and the employers' own stated culture (from careers pages) was just 0.31. AI's version of your culture is built from Glassdoor, not from you.
The Negativity Amplification Effect
Review platforms have a well-documented negativity bias: dissatisfied employees are 2–3x more likely to leave reviews than satisfied ones. This negativity bias flows directly into AI training data.
The amplification happens because:
-
Negative reviews are longer. Dissatisfied reviewers write more detailed reviews, meaning more text for AI to train on. A scathing 500-word review generates more training signal than a brief "Great place to work!" 5-star review.
-
Negative reviews use more specific language. They cite specific incidents, name departments, and describe problems in detail. AI latches onto specificity because it provides richer training data.
-
Negative sentiment is more memorable in language. Language models weight emotionally charged text more heavily in their pattern matching. Strong negative language ("toxic culture", "terrible management") creates stronger associations than moderate positive language ("nice office", "decent benefits").
The result is that AI's characterisation of most employers skews slightly negative compared to the employer's actual current employee experience. According to OpenRole's industry benchmarks, the average AI sentiment score for UK employers is 0.4 points lower than their current Glassdoor rating — suggesting systematic negativity amplification.
What Employers Can Do About It
You cannot edit AI's training data. You cannot remove old Glassdoor reviews from AI models' knowledge. But you can take specific actions to counterbalance the review data effect.
1. Publish Authoritative First-Party Data
The most effective counterweight to review-derived data is authoritative first-party data published in AI-accessible formats.
- Publish current salary ranges on your careers page in JobPosting schema markup. When AI has authoritative salary data from you, it's less likely to rely on estimates from reviews.
- Publish a comprehensive FAQ addressing the exact questions candidates ask AI. Use FAQPage schema so AI can extract your answers directly.
- Write detailed culture content in plain HTML on your website. Not a PDF. Not a video. Text that AI crawlers can read.
2. Create an llms.txt File
An llms.txt file lets you provide AI models with a structured briefing about your company. Include current information about culture, benefits, salary approach, and any recent changes. This gives AI an authoritative source to cite instead of relying solely on reviews.
Use OpenRole's free llms.txt generator to create one in minutes.
3. Address the Recency Problem Directly
If your company has undergone significant positive change, you need to make that change visible to AI:
- Publish a "Life at [Company] in 2026" page with current information
- Include dates in your content ("As of March 2026, our hybrid policy is...")
- Reference the change explicitly ("Since our cultural transformation in 2023...")
- Ensure recent content is published frequently so AI crawlers pick it up
4. Monitor AI Responses Regularly
Run regular AI employer brand audits to see how AI characterises your company. Track changes over time. If AI is still citing outdated information, you need more first-party content to counterbalance it.
5. Respond Strategically on Review Platforms
Your Glassdoor responses become part of AI training data too. When responding to reviews:
- Include current, factual information about your company
- Reference specific improvements or changes
- Use keywords that match what candidates ask AI ("salary range", "remote work policy", "career progression")
- Keep responses substantive — brief responses contribute less training data
6. Implement Schema Markup
Structured data gives AI machine-readable facts about your company. When AI has structured data from your website, it's more likely to cite those facts than synthesised review data. Key schema types:
- Organisation schema (company facts)
- JobPosting schema (salary ranges, benefits)
- FAQPage schema (direct answers to candidate questions)
- Review schema (your own aggregate ratings)
The Future: Will This Get Better?
The relationship between review platforms and AI is evolving. Several trends suggest partial improvement:
More real-time data access. As AI models increasingly use live web browsing rather than static training data, the recency problem will diminish. ChatGPT with browsing and Perplexity already pull live data, meaning current information can override historical training data.
Better source quality signals. AI models are getting better at evaluating source authority. First-party employer data (from your website) is increasingly weighted higher than third-party reviews.
Structured data adoption. As more employers publish schema markup, AI will have more authoritative data to work with, reducing reliance on review-derived information.
However, the fundamental issue remains: review platforms will always contribute a massive volume of employer-specific text, and that volume will always influence AI's characterisation of employers. The solution isn't to wait for AI to get smarter — it's to ensure your first-party data is comprehensive, current, and AI-accessible.
The Bottom Line
AI models use Glassdoor reviews as a primary source for employer information because reviews are text-rich, voluminous, and easy to crawl. The reviews become current "facts" in AI responses regardless of when they were written. Negative reviews are amplified. Salary claims go unverified. And employers who don't publish their own AI-accessible data are defined entirely by their review history.
The employers who are winning in AI search — those ranking highest in the UK AI employer visibility index — are the ones who have flooded the AI ecosystem with authoritative, structured, current data about who they are. Reviews still exist in AI's training data, but they're counterbalanced by first-party information that AI can verify and cite.
Don't let Glassdoor reviews from 2021 define your employer brand in 2026. Run a free audit at openrole.co.uk to see what AI is saying about you today.
Frequently Asked Questions
Q: Can I get Glassdoor reviews removed from AI training data?
A: No. Once review data has been ingested into an AI model's training dataset, it cannot be selectively removed. AI companies do not offer mechanisms for employers to request removal of specific training data. The only effective strategy is to publish authoritative first-party data that counterbalances the review data. Over time, as models retrain and incorporate your new data, the influence of old reviews will diminish.
Q: Does responding to Glassdoor reviews affect what AI says?
A: Yes, to a degree. Your responses are also crawled and included in training data. A substantive response that includes current facts ("As of 2026, we offer a salary range of £55,000–£75,000 for this role and have implemented a 4-day work week") becomes part of the data AI uses. Keep responses factual and keyword-rich rather than defensive or generic.
Q: How much of AI's employer information comes from Glassdoor specifically?
A: Based on OpenRole's analysis, Glassdoor is the single largest source of employer culture and compensation data in AI responses, contributing to approximately 40–60% of culture-related statements and 30–50% of salary-related statements. Indeed reviews contribute an additional 15–25%. The remainder comes from news articles, company websites, and other sources.
Q: Is this a GDPR issue? Can I take legal action?
A: The legal landscape around AI training data and employer reviews is still evolving. Glassdoor reviews are user-generated public content, and AI companies generally rely on fair use or equivalent arguments for training data. Some employers have explored legal challenges, but as of 2026, no UK employer has successfully compelled an AI company to remove review-derived training data. The practical approach remains publishing authoritative first-party data rather than pursuing legal remedies.
Q: Does this affect all AI models equally?
A: No. Models that use live web browsing (Perplexity, ChatGPT with browsing enabled) can access current information and are less reliant on historical training data. Models that rely solely on pre-trained knowledge (static ChatGPT queries) are more heavily influenced by historical review data. Google AI Overviews pull from live search results, making them more responsive to current data. This is why publishing current, crawlable content on your website is the highest-priority action.