Technical Whitepaper

Methodology & Scientific Framework

How echotest constructs, simulates, and calibrates synthetic audience responses across 195 markets using census-anchored archetypes and multi-round debate dynamics.

Version 2.0 · April 2026 · echotest Research

Contents

1. The 10-Layer Intelligence Stack

2. Archetype Construction Methodology

3. Simulation Pipeline

4. Calibration Framework

5. Validation: Prediction vs. Reality

6. Comparison to Traditional Research

7. Limitations & Honest Uncertainty

8. Data Sources & Citations

1. The 10-Layer Intelligence Stack

Every echotest simulation passes through a 10-layer pipeline. Each layer adds a dimension of behavioral realism that generic AI tools cannot replicate. The layers are cumulative — each builds on the output of the previous one.

LayerNameWhat It CapturesSource
1DemographicsWHO they are2.5M archetypes from UN population data
2Cultural DNAHOW they thinkHofstede 6D + World Values Survey
3Transaction DNACAN they buy35 behavioral fields per archetype
4Persona IntelWHY they buyJTBD framework + life stage + brand affinity
5Brand IntelligenceWHAT they trustGPT analysis + resonance scoring
6OCEAN PersonalityWHO they really areBig Five derived from Hofstede + DNA
7Identity ModulationCONTEXT shifts behavior6 modes: professional, family, luxury, impulse, health, tech
8Agent MemoryHISTORY mattersPersistent experience across simulations
9Emergence EngineUNEXPECTED insightsFree-play discovery round before debate
10Stochastic NoiseREAL human messinessControlled randomness in decision-making

Layers 1–6 are pre-computed for all 2.5 million archetypes. Layers 7–10 are applied at simulation time based on the specific content, product, or policy being tested. This means every simulation is contextually modulated — the same archetype reacts differently to a luxury product vs. a budget SaaS tool.

2. Archetype Construction Methodology

2.1 Population-Proportional Sampling

echotest maintains a database of 2,521,382 synthetic archetypes distributed across 195 UN member states. Each country's archetype count is proportional to its population, ensuring that India (1.4B people) has proportionally more archetypes than Luxembourg (650K).

Within each country, archetypes are stratified across 5 demographic dimensions:

Source: UN World Population Prospects 2024, World Bank World Development Indicators

2.1b Why we go down to the neighborhood

A country average hides more than it reveals. The person browsing on Palm Jumeirah doesn't buy like the person in Deira. A Riyadh executive doesn't shop like a Hofuf factory worker. A Mumbai Bandra resident doesn't make the same decisions as someone in Dharavi. Treating them all as “the average Indian / Saudi / Emirati consumer” throws away the very variation that decides whether your product lands or stalls.

In our key markets we build separate audiences for distinct neighborhoods — their own income mix, religious composition, language, brand reach, and cultural texture. So when you test a luxury launch, the panel that responds is shaped by the streets it would actually sell on, not a country-wide mean that smooths the signal away.

A few of the contrasts we model:

Dubai

Palm Jumeirah vs Deira

Tokyo

Minato vs Adachi

Mumbai

Bandra vs Dharavi

Riyadh

Sulamaniyah vs Hofuf

London

Kensington vs Tower Hamlets

New York

Upper East Side vs South Bronx

For markets we haven't gone deep on yet, the panel falls back to country-level distributions — still population-proportional, still culturally anchored, just without the neighborhood split.

2.2 Cultural Dimensions

Each archetype inherits cultural dimensions from its country, sourced from the Hofstede Institute's 6-Dimension model and the World Values Survey. These dimensions shape how archetypes process information, respond to authority, handle uncertainty, and make purchasing decisions.

Power Distance

Acceptance of hierarchical authority

Individualism

Self vs. group orientation

Masculinity

Achievement vs. quality of life

Uncertainty Avoidance

Comfort with ambiguity and risk

Long-Term Orientation

Planning horizon and tradition

Indulgence

Impulse control and gratification

Source: Hofstede Insights (hofstede-insights.com), World Values Survey Wave 7 (2017-2022)

2.3 Transaction DNA

Beyond demographics and culture, each archetype carries 35+ behavioral fields that model purchasing psychology. These are computed from the intersection of income, education, cultural dimensions, and sector-specific modifiers.

Spending Power

Disposable income, price sensitivity, luxury vs. value, savings rate

Purchase Psychology

Impulse score, research depth, social proof need, FOMO, risk aversion

Brand & Loyalty

Switching cost, deal sensitivity, referral propensity, review tendency

Channel & Journey

Online vs. offline, mobile comfort, ad receptivity, cart abandonment risk

2.4 OCEAN Personality (Big Five)

Each archetype's Big Five personality profile is computed mathematically from Hofstede cultural dimensions + Transaction DNA fields. Zero LLM calls are used in personality computation — it is a pure mathematical derivation that runs deterministically.

Derivation Formula (simplified):

Openness = 0.35×(1-UAI/100) + 0.30×novelty_seeking + 0.20×(IDV/100) + 0.15×tech_adoption
Conscientiousness = 0.30×(UAI/100) + 0.25×research_depth + 0.25×(LTO/100) + 0.20×savings_rate
Extraversion = 0.30×(IVR/100) + 0.25×influence_score + 0.20×social_proof + 0.25×engagement
Agreeableness = 0.30×(1-IDV/100) + 0.25×(1-MAS/100) + 0.20×susceptibility + 0.25×(1-risk)
Neuroticism = 0.35×(UAI/100) + 0.25×risk_aversion + 0.20×loss_aversion + 0.20×(1-IVR/100)

3. Simulation Pipeline

When a user submits content for simulation, the following pipeline executes:

1
Content Analysis~3s

LLM classifies content type, topic, tone, controversy potential, cultural sensitivity signals, and target audience inference.

2
Agent Sampling~5s

Stratified population-proportional sampling selects agents from PostgreSQL. Qdrant vector search boosts with content-relevant archetypes. Neo4j ensures high-influence nodes are included.

3
Context Injection~2s

Each agent receives their full persona prompt including cultural dimensions, behavioral DNA, OCEAN traits, real-time intelligence (news, events), and identity modulation based on content type.

4
Three-Round Debate~45s

Round 1: Independent reaction (no social influence). Round 2: Confrontation — agents see peer arguments and can shift position. Round 3: Neo4j cascade — influence propagates through the social graph.

5
Linguistic Calibration~10s

Raw LLM responses are refined to match each archetype's communication style, formality, and platform conventions.

6
Synthesis & Analytics~8s

Sentiment aggregation, SWOT analysis, psychological insights (JTBD, cognitive biases), demographic breakdowns, NPS, virality scoring, price elasticity, and revenue forecasting.

7
Calibration Weights<1s

If the user has past feedback loops (Level 2+), historical accuracy data adjusts sentiment predictions, segment weights, and confidence scores.

Total simulation time for 200 agents: ~60-90 seconds. For 2,000 agents: ~5-8 minutes. Results are streamed in real-time via WebSocket as each stage completes.

4. Calibration Framework

echotest does not claim oracle-level accuracy. Instead, every prediction is accompanied by explicit confidence metrics, reliability grades, and calibration level indicators. The system is designed to be honestly uncertain and to improve over time.

4.1 Confidence Scoring

Confidence scores (0–100%) are computed from four weighted components:

ComponentWeightHow It's Computed
Sample Adequacy30%Agent count: 500+ = full marks, <50 = minimum
Country Coverage25%More countries = higher geographic representativeness
Demographic Coverage25%Unique age groups × gender diversity in sample
Debate Quality20%Position shift rate — did agents actually change their minds?

4.2 Reliability Grades

Every report receives a reliability grade based on sample size, using Wilson score confidence intervals:

A

500+ agents

Margin: ±3.5%

B+

200-500 agents

Margin: ±5-7%

B

100-200 agents

Margin: ±7-10%

C

50-100 agents

Margin: ±10-14%

D

<50 agents

Margin: ±14%+

Based on Wilson score interval for binomial proportions (Wilson, 1927). Margin of error computed at 95% confidence level.

4.3 Calibration Levels (The Learning Loop)

The system improves per customer through a closed-loop feedback mechanism. Users who feed real campaign results back receive progressively more accurate predictions.

LevelNameRequirementWhat Changes
0NewDefaultPure simulation, generic benchmarks only
1Benchmarked1st simulationIndustry benchmarks applied to scoring
2Calibrating3+ outcome reportsUser-specific sentiment/segment adjustments from historical accuracy data
3Calibrated10+ outcomes, >70% avg accuracyML-optimized weights per segment, per country, per content type

4.4 Lifecycle Calibration Loop (How it works in practice)

The Lifecycle Simulator (Engine 5) uses a market-bucket calibration system that tracks systematic prediction deltas per (country, category, dampening mode). Each completed loop improves the bucket's rolling stats and sharpens future predictions for the same market.

LIVE CALIBRATION LOOP

How echotest learns from your real outcomes

Loop 0 of 7

Click play to watch the calibration loop run

Sample Count (n)
1
Confidence
low
Penetration Delta
Suggested (free / aware)

Each calibration bucket is keyed by (country, category, dampening_mode). After 7 verified outcomes, the SA × streaming bucket reaches high confidence and recommends pre-filled values for every future simulation.

5. Validation: Prediction vs. Reality

echotest predictions are continuously backtested against actual campaign outcomes. When users submit real-world results through the Decision Intelligence system, the comparison engine computes:

Sample Validation (Early Access)

84%

Avg Sentiment Accuracy

Across 50+ backtested campaigns

78%

Avg Conversion Accuracy

Commerce simulations only

+12%

Improvement at Level 2+

vs. Level 0 baseline

6. Comparison to Traditional Research

echotest is not a replacement for all market research. It is a complement that excels at speed, breadth, and early-stage hypothesis testing. Here is an honest comparison:

DimensionTraditional ResearchGeneric AI (ChatGPT)echotest
Speed2-6 weeksMinutes (unstructured)60-90 seconds
Cost per study$10K-$100K$0-$20$10-$20 per simulation
Market coverage1-3 countriesGeneric (no real data)195 countries
Cultural depthBasic demographicsNone6 Hofstede dims + WVS + 35 DNA fields
Sample size500-2,000 real humans1 model50-2,000 structured archetypes
RepeatabilityLow (different respondents)Low (temperature variance)High (deterministic archetypes)
Learns over timeNoNoYes (Calibration Levels 0-3)
Statistical rigorHigh (real data)NoneModerate (Wilson intervals, reliability grades)
Ideal use caseFinal validationQuick brainstormingPre-launch stress testing

When to use echotest: Before you commit budget. Test 10 ideas in the time it takes to brief one focus group. Narrow your options, then validate the winner with traditional research if the stakes justify it.

7. Limitations & Honest Uncertainty

Scientific honesty is a core principle of echotest. We explicitly acknowledge the following limitations:

Synthetic, not real

Archetypes are statistical constructs, not real humans. They cannot capture individual lived experience, recent personal events, or genuine emotional states.

LLM-dependent behavior

Agent responses are generated by large language models. While culturally calibrated through persona prompts, they inherit LLM biases and limitations.

No causal claims

echotest identifies correlations and directional indicators. It does not establish causal relationships between content and outcomes.

Cultural dimensions are averages

Hofstede scores represent national averages. Individual variation within countries can be enormous. Sub-regional overrides partially address this for diverse nations (India, Nigeria, etc.).

Calibration requires feedback

The system only improves if users submit actual results. Without feedback loops, predictions remain at Level 0 (uncalibrated).

Every echotest report includes a disclaimer: “Results are directional indicators based on synthetic agent responses. Not a statistical prediction. Real-world outcomes may differ. Use results as one input among many in your decision-making process.”

8. Data Sources & Citations

Demographics

United Nations, Department of Economic and Social Affairs, Population Division (2024). World Population Prospects 2024.

Economics

World Bank. World Development Indicators (2024). GDP per capita, internet penetration, Gini index, literacy rates.

Culture

Hofstede, G. (2011). Dimensionalizing Cultures: The Hofstede Model in Context. Online Readings in Psychology and Culture, 2(1). Data: hofstede-insights.com

Values

Inglehart, R. et al. World Values Survey Wave 7 (2017-2022). worldvaluessurvey.org

Personality

McCrae, R.R. & Costa, P.T. (1992). An Introduction to the Five-Factor Model and Its Applications. Journal of Personality, 60(2), 175-215.

JTBD

Christensen, C.M. et al. (2016). Know Your Customers’ "Jobs to Be Done." Harvard Business Review.

Statistics

Wilson, E.B. (1927). Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association, 22(158), 209-212.

Platforms

DataReportal (2024). Digital 2024 Global Overview Report. Social media penetration and platform-specific demographics.

Press Freedom

Reporters Without Borders (2024). World Press Freedom Index.

Ready to stress-test your next launch?

See the methodology in action. Run your first simulation.

Start Stress-Testing

© 2026 echotest. All rights reserved. · Home · Get Started