Technical Whitepaper
How echotest constructs, simulates, and calibrates synthetic audience responses across 195 markets using census-anchored archetypes and multi-round debate dynamics.
Version 2.0 · April 2026 · echotest Research
Contents
1. The 10-Layer Intelligence Stack
2. Archetype Construction Methodology
3. Simulation Pipeline
4. Calibration Framework
5. Validation: Prediction vs. Reality
6. Comparison to Traditional Research
7. Limitations & Honest Uncertainty
8. Data Sources & Citations
Every echotest simulation passes through a 10-layer pipeline. Each layer adds a dimension of behavioral realism that generic AI tools cannot replicate. The layers are cumulative — each builds on the output of the previous one.
| Layer | Name | What It Captures | Source |
|---|---|---|---|
| 1 | Demographics | WHO they are | 2.5M archetypes from UN population data |
| 2 | Cultural DNA | HOW they think | Hofstede 6D + World Values Survey |
| 3 | Transaction DNA | CAN they buy | 35 behavioral fields per archetype |
| 4 | Persona Intel | WHY they buy | JTBD framework + life stage + brand affinity |
| 5 | Brand Intelligence | WHAT they trust | GPT analysis + resonance scoring |
| 6 | OCEAN Personality | WHO they really are | Big Five derived from Hofstede + DNA |
| 7 | Identity Modulation | CONTEXT shifts behavior | 6 modes: professional, family, luxury, impulse, health, tech |
| 8 | Agent Memory | HISTORY matters | Persistent experience across simulations |
| 9 | Emergence Engine | UNEXPECTED insights | Free-play discovery round before debate |
| 10 | Stochastic Noise | REAL human messiness | Controlled randomness in decision-making |
Layers 1–6 are pre-computed for all 2.5 million archetypes. Layers 7–10 are applied at simulation time based on the specific content, product, or policy being tested. This means every simulation is contextually modulated — the same archetype reacts differently to a luxury product vs. a budget SaaS tool.
echotest maintains a database of 2,521,382 synthetic archetypes distributed across 195 UN member states. Each country's archetype count is proportional to its population, ensuring that India (1.4B people) has proportionally more archetypes than Luxembourg (650K).
Within each country, archetypes are stratified across 5 demographic dimensions:
Source: UN World Population Prospects 2024, World Bank World Development Indicators
A country average hides more than it reveals. The person browsing on Palm Jumeirah doesn't buy like the person in Deira. A Riyadh executive doesn't shop like a Hofuf factory worker. A Mumbai Bandra resident doesn't make the same decisions as someone in Dharavi. Treating them all as “the average Indian / Saudi / Emirati consumer” throws away the very variation that decides whether your product lands or stalls.
In our key markets we build separate audiences for distinct neighborhoods — their own income mix, religious composition, language, brand reach, and cultural texture. So when you test a luxury launch, the panel that responds is shaped by the streets it would actually sell on, not a country-wide mean that smooths the signal away.
A few of the contrasts we model:
Palm Jumeirah vs Deira
Minato vs Adachi
Bandra vs Dharavi
Sulamaniyah vs Hofuf
Kensington vs Tower Hamlets
Upper East Side vs South Bronx
For markets we haven't gone deep on yet, the panel falls back to country-level distributions — still population-proportional, still culturally anchored, just without the neighborhood split.
Each archetype inherits cultural dimensions from its country, sourced from the Hofstede Institute's 6-Dimension model and the World Values Survey. These dimensions shape how archetypes process information, respond to authority, handle uncertainty, and make purchasing decisions.
Acceptance of hierarchical authority
Self vs. group orientation
Achievement vs. quality of life
Comfort with ambiguity and risk
Planning horizon and tradition
Impulse control and gratification
Source: Hofstede Insights (hofstede-insights.com), World Values Survey Wave 7 (2017-2022)
Beyond demographics and culture, each archetype carries 35+ behavioral fields that model purchasing psychology. These are computed from the intersection of income, education, cultural dimensions, and sector-specific modifiers.
Disposable income, price sensitivity, luxury vs. value, savings rate
Impulse score, research depth, social proof need, FOMO, risk aversion
Switching cost, deal sensitivity, referral propensity, review tendency
Online vs. offline, mobile comfort, ad receptivity, cart abandonment risk
Each archetype's Big Five personality profile is computed mathematically from Hofstede cultural dimensions + Transaction DNA fields. Zero LLM calls are used in personality computation — it is a pure mathematical derivation that runs deterministically.
Derivation Formula (simplified):
Openness = 0.35×(1-UAI/100) + 0.30×novelty_seeking + 0.20×(IDV/100) + 0.15×tech_adoption
Conscientiousness = 0.30×(UAI/100) + 0.25×research_depth + 0.25×(LTO/100) + 0.20×savings_rate
Extraversion = 0.30×(IVR/100) + 0.25×influence_score + 0.20×social_proof + 0.25×engagement
Agreeableness = 0.30×(1-IDV/100) + 0.25×(1-MAS/100) + 0.20×susceptibility + 0.25×(1-risk)
Neuroticism = 0.35×(UAI/100) + 0.25×risk_aversion + 0.20×loss_aversion + 0.20×(1-IVR/100)When a user submits content for simulation, the following pipeline executes:
LLM classifies content type, topic, tone, controversy potential, cultural sensitivity signals, and target audience inference.
Stratified population-proportional sampling selects agents from PostgreSQL. Qdrant vector search boosts with content-relevant archetypes. Neo4j ensures high-influence nodes are included.
Each agent receives their full persona prompt including cultural dimensions, behavioral DNA, OCEAN traits, real-time intelligence (news, events), and identity modulation based on content type.
Round 1: Independent reaction (no social influence). Round 2: Confrontation — agents see peer arguments and can shift position. Round 3: Neo4j cascade — influence propagates through the social graph.
Raw LLM responses are refined to match each archetype's communication style, formality, and platform conventions.
Sentiment aggregation, SWOT analysis, psychological insights (JTBD, cognitive biases), demographic breakdowns, NPS, virality scoring, price elasticity, and revenue forecasting.
If the user has past feedback loops (Level 2+), historical accuracy data adjusts sentiment predictions, segment weights, and confidence scores.
Total simulation time for 200 agents: ~60-90 seconds. For 2,000 agents: ~5-8 minutes. Results are streamed in real-time via WebSocket as each stage completes.
echotest does not claim oracle-level accuracy. Instead, every prediction is accompanied by explicit confidence metrics, reliability grades, and calibration level indicators. The system is designed to be honestly uncertain and to improve over time.
Confidence scores (0–100%) are computed from four weighted components:
| Component | Weight | How It's Computed |
|---|---|---|
| Sample Adequacy | 30% | Agent count: 500+ = full marks, <50 = minimum |
| Country Coverage | 25% | More countries = higher geographic representativeness |
| Demographic Coverage | 25% | Unique age groups × gender diversity in sample |
| Debate Quality | 20% | Position shift rate — did agents actually change their minds? |
Every report receives a reliability grade based on sample size, using Wilson score confidence intervals:
500+ agents
Margin: ±3.5%
200-500 agents
Margin: ±5-7%
100-200 agents
Margin: ±7-10%
50-100 agents
Margin: ±10-14%
<50 agents
Margin: ±14%+
Based on Wilson score interval for binomial proportions (Wilson, 1927). Margin of error computed at 95% confidence level.
The system improves per customer through a closed-loop feedback mechanism. Users who feed real campaign results back receive progressively more accurate predictions.
| Level | Name | Requirement | What Changes |
|---|---|---|---|
| 0 | New | Default | Pure simulation, generic benchmarks only |
| 1 | Benchmarked | 1st simulation | Industry benchmarks applied to scoring |
| 2 | Calibrating | 3+ outcome reports | User-specific sentiment/segment adjustments from historical accuracy data |
| 3 | Calibrated | 10+ outcomes, >70% avg accuracy | ML-optimized weights per segment, per country, per content type |
The Lifecycle Simulator (Engine 5) uses a market-bucket calibration system that tracks systematic prediction deltas per (country, category, dampening mode). Each completed loop improves the bucket's rolling stats and sharpens future predictions for the same market.
LIVE CALIBRATION LOOP
Each calibration bucket is keyed by (country, category, dampening_mode). After 7 verified outcomes, the SA × streaming bucket reaches high confidence and recommends pre-filled values for every future simulation.
echotest predictions are continuously backtested against actual campaign outcomes. When users submit real-world results through the Decision Intelligence system, the comparison engine computes:
Sample Validation (Early Access)
Avg Sentiment Accuracy
Across 50+ backtested campaigns
Avg Conversion Accuracy
Commerce simulations only
Improvement at Level 2+
vs. Level 0 baseline
echotest is not a replacement for all market research. It is a complement that excels at speed, breadth, and early-stage hypothesis testing. Here is an honest comparison:
| Dimension | Traditional Research | Generic AI (ChatGPT) | echotest |
|---|---|---|---|
| Speed | 2-6 weeks | Minutes (unstructured) | 60-90 seconds |
| Cost per study | $10K-$100K | $0-$20 | $10-$20 per simulation |
| Market coverage | 1-3 countries | Generic (no real data) | 195 countries |
| Cultural depth | Basic demographics | None | 6 Hofstede dims + WVS + 35 DNA fields |
| Sample size | 500-2,000 real humans | 1 model | 50-2,000 structured archetypes |
| Repeatability | Low (different respondents) | Low (temperature variance) | High (deterministic archetypes) |
| Learns over time | No | No | Yes (Calibration Levels 0-3) |
| Statistical rigor | High (real data) | None | Moderate (Wilson intervals, reliability grades) |
| Ideal use case | Final validation | Quick brainstorming | Pre-launch stress testing |
When to use echotest: Before you commit budget. Test 10 ideas in the time it takes to brief one focus group. Narrow your options, then validate the winner with traditional research if the stakes justify it.
Scientific honesty is a core principle of echotest. We explicitly acknowledge the following limitations:
Synthetic, not real
Archetypes are statistical constructs, not real humans. They cannot capture individual lived experience, recent personal events, or genuine emotional states.
LLM-dependent behavior
Agent responses are generated by large language models. While culturally calibrated through persona prompts, they inherit LLM biases and limitations.
No causal claims
echotest identifies correlations and directional indicators. It does not establish causal relationships between content and outcomes.
Cultural dimensions are averages
Hofstede scores represent national averages. Individual variation within countries can be enormous. Sub-regional overrides partially address this for diverse nations (India, Nigeria, etc.).
Calibration requires feedback
The system only improves if users submit actual results. Without feedback loops, predictions remain at Level 0 (uncalibrated).
Every echotest report includes a disclaimer: “Results are directional indicators based on synthetic agent responses. Not a statistical prediction. Real-world outcomes may differ. Use results as one input among many in your decision-making process.”
United Nations, Department of Economic and Social Affairs, Population Division (2024). World Population Prospects 2024.
World Bank. World Development Indicators (2024). GDP per capita, internet penetration, Gini index, literacy rates.
Hofstede, G. (2011). Dimensionalizing Cultures: The Hofstede Model in Context. Online Readings in Psychology and Culture, 2(1). Data: hofstede-insights.com
Inglehart, R. et al. World Values Survey Wave 7 (2017-2022). worldvaluessurvey.org
McCrae, R.R. & Costa, P.T. (1992). An Introduction to the Five-Factor Model and Its Applications. Journal of Personality, 60(2), 175-215.
Christensen, C.M. et al. (2016). Know Your Customers’ "Jobs to Be Done." Harvard Business Review.
Wilson, E.B. (1927). Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association, 22(158), 209-212.
DataReportal (2024). Digital 2024 Global Overview Report. Social media penetration and platform-specific demographics.
Reporters Without Borders (2024). World Press Freedom Index.
See the methodology in action. Run your first simulation.
Start Stress-Testing© 2026 echotest. All rights reserved. · Home · Get Started