Technical Whitepaper

Methodology & Scientific Framework

How echotest constructs, simulates, and calibrates synthetic audience responses across 195 markets using census-anchored archetypes and multi-round debate dynamics.

Version 2.0 · April 2026 · echotest Research

Contents

1. The 10-Layer Intelligence Stack

2. Archetype Construction Methodology

3. Simulation Pipeline

4. Calibration Framework

5. Validation: Prediction vs. Reality

6. Comparison to Traditional Research

7. Limitations & Honest Uncertainty

8. Data Sources & Citations

1. The 10-Layer Intelligence Stack

Every echotest simulation passes through a 10-layer pipeline. Each layer adds a dimension of behavioral realism that generic AI tools cannot replicate. The layers are cumulative — each builds on the output of the previous one.

Layer	Name	What It Captures	Source
1	Demographics	WHO they are	2.5M archetypes from UN population data
2	Cultural DNA	HOW they think	Hofstede 6D + World Values Survey
3	Transaction DNA	CAN they buy	35 behavioral fields per archetype
4	Persona Intel	WHY they buy	JTBD framework + life stage + brand affinity
5	Brand Intelligence	WHAT they trust	GPT analysis + resonance scoring
6	OCEAN Personality	WHO they really are	Big Five derived from Hofstede + DNA
7	Identity Modulation	CONTEXT shifts behavior	6 modes: professional, family, luxury, impulse, health, tech
8	Agent Memory	HISTORY matters	Persistent experience across simulations
9	Emergence Engine	UNEXPECTED insights	Free-play discovery round before debate
10	Stochastic Noise	REAL human messiness	Controlled randomness in decision-making

Layers 1–6 are pre-computed for all 2.5 million archetypes. Layers 7–10 are applied at simulation time based on the specific content, product, or policy being tested. This means every simulation is contextually modulated — the same archetype reacts differently to a luxury product vs. a budget SaaS tool.

2. Archetype Construction Methodology

2.1 Population-Proportional Sampling

echotest maintains a database of 2,521,382 synthetic archetypes distributed across 195 UN member states. Each country's archetype count is proportional to its population, ensuring that India (1.4B people) has proportionally more archetypes than Luxembourg (650K).

Within each country, archetypes are stratified across 5 demographic dimensions:

Age group — 7 buckets (13-17 through 65+), weighted by national age distribution
Gender — Male/Female, weighted by national ratio
Income bracket — 5 tiers (low through high), calibrated to GDP per capita
Education level — 6 levels (primary through doctorate)
Urban/Rural — Weighted by national urbanization rate

Source: UN World Population Prospects 2024, World Bank World Development Indicators

2.1b Why we go down to the neighborhood

A country average hides more than it reveals. The person browsing on Palm Jumeirah doesn't buy like the person in Deira. A Riyadh executive doesn't shop like a Hofuf factory worker. A Mumbai Bandra resident doesn't make the same decisions as someone in Dharavi. Treating them all as “the average Indian / Saudi / Emirati consumer” throws away the very variation that decides whether your product lands or stalls.

In our key markets we build separate audiences for distinct neighborhoods — their own income mix, religious composition, language, brand reach, and cultural texture. So when you test a luxury launch, the panel that responds is shaped by the streets it would actually sell on, not a country-wide mean that smooths the signal away.

A few of the contrasts we model:

Dubai

Palm Jumeirah vs Deira

Tokyo

Minato vs Adachi

Mumbai

Bandra vs Dharavi

Riyadh

Sulamaniyah vs Hofuf

London

Kensington vs Tower Hamlets

New York

Upper East Side vs South Bronx

For markets we haven't gone deep on yet, the panel falls back to country-level distributions — still population-proportional, still culturally anchored, just without the neighborhood split.

2.2 Cultural Dimensions

Each archetype inherits cultural dimensions from its country, sourced from the Hofstede Institute's 6-Dimension model and the World Values Survey. These dimensions shape how archetypes process information, respond to authority, handle uncertainty, and make purchasing decisions.

Power Distance

Acceptance of hierarchical authority

Individualism

Self vs. group orientation

Masculinity

Achievement vs. quality of life

Uncertainty Avoidance

Comfort with ambiguity and risk

Long-Term Orientation

Planning horizon and tradition

Indulgence

Impulse control and gratification

Source: Hofstede Insights (hofstede-insights.com), World Values Survey Wave 7 (2017-2022)

2.3 Transaction DNA

Beyond demographics and culture, each archetype carries 35+ behavioral fields that model purchasing psychology. These are computed from the intersection of income, education, cultural dimensions, and sector-specific modifiers.

To be explicit: we did not survey 2.5 million individuals — that isn't feasible. Each field is estimated from established correlations in the psychometric and consumer-behavior literature (for example, price sensitivity rises as income falls; agreeableness trends higher in collectivist cultures), then seeded deterministically so the same archetype always resolves to the same profile. We validate the direction and strength of these relationships rather than claiming per-person ground truth.

Spending Power

Disposable income, price sensitivity, luxury vs. value, savings rate

Purchase Psychology

Impulse score, research depth, social proof need, FOMO, risk aversion

Brand & Loyalty

Switching cost, deal sensitivity, referral propensity, review tendency

Channel & Journey

Online vs. offline, mobile comfort, ad receptivity, cart abandonment risk

2.4 OCEAN Personality (Big Five)

Each archetype's Big Five personality profile is computed mathematically from Hofstede cultural dimensions + Transaction DNA fields. Zero LLM calls are used in personality computation — it is a pure mathematical derivation that runs deterministically.

Derivation Formula (simplified):

Openness = 0.35×(1-UAI/100) + 0.30×novelty_seeking + 0.20×(IDV/100) + 0.15×tech_adoption
Conscientiousness = 0.30×(UAI/100) + 0.25×research_depth + 0.25×(LTO/100) + 0.20×savings_rate
Extraversion = 0.30×(IVR/100) + 0.25×influence_score + 0.20×social_proof + 0.25×engagement
Agreeableness = 0.30×(1-IDV/100) + 0.25×(1-MAS/100) + 0.20×susceptibility + 0.25×(1-risk)
Neuroticism = 0.35×(UAI/100) + 0.25×risk_aversion + 0.20×loss_aversion + 0.20×(1-IVR/100)

3. Simulation Pipeline

When a user submits content for simulation, the following pipeline executes:

Content Analysis~3s

LLM classifies content type, topic, tone, controversy potential, cultural sensitivity signals, and target audience inference.

Agent Sampling~5s

Stratified population-proportional sampling selects agents from PostgreSQL. Qdrant vector search boosts with content-relevant archetypes. Neo4j ensures high-influence nodes are included.

Context Injection~2s

Each agent receives their full persona prompt including cultural dimensions, behavioral DNA, OCEAN traits, real-time intelligence (news, events), and identity modulation based on content type.

Three-Round Debate~45s

Round 1: Independent reaction (no social influence). Round 2: Confrontation — agents see peer arguments and can shift position. Round 3: Neo4j cascade — influence propagates through the social graph.

Linguistic Calibration~10s

Raw LLM responses are refined to match each archetype's communication style, formality, and platform conventions.

Synthesis & Analytics~8s

Sentiment aggregation, SWOT analysis, psychological insights (JTBD, cognitive biases), demographic breakdowns, NPS, virality scoring, price elasticity, and revenue forecasting.

Calibration Weights<1s

If the user has past feedback loops (Level 2+), historical accuracy data adjusts sentiment predictions, segment weights, and confidence scores.

Total simulation time for 200 agents: ~60-90 seconds. For 2,000 agents: ~5-8 minutes. Results are streamed in real-time via WebSocket as each stage completes.

4. Calibration Framework

echotest does not claim oracle-level accuracy. Instead, every prediction is accompanied by explicit confidence metrics, reliability grades, and calibration level indicators. The system is designed to be honestly uncertain and to improve over time.

4.1 Confidence Scoring

Confidence scores (0–100%) are computed from four weighted components:

Component	Weight	How It's Computed
Sample Adequacy	30%	Agent count: 500+ = full marks, <50 = minimum
Country Coverage	25%	More countries = higher geographic representativeness
Demographic Coverage	25%	Unique age groups × gender diversity in sample
Debate Quality	20%	Position shift rate — did agents actually change their minds?

4.2 Reliability Grades

Every report receives a reliability grade based on sample size, using Wilson score confidence intervals:

500+ agents

Margin: ±3.5%

B+

200-500 agents

Margin: ±5-7%

100-200 agents

Margin: ±7-10%

50-100 agents

Margin: ±10-14%

<50 agents

Margin: ±14%+

Based on Wilson score interval for binomial proportions (Wilson, 1927). Margin of error computed at 95% confidence level.

4.3 Calibration Levels (The Learning Loop)

The system improves per customer through a closed-loop feedback mechanism. Users who feed real campaign results back receive progressively more accurate predictions.

Level	Name	Requirement	What Changes
0	New	Default	Pure simulation, generic benchmarks only
1	Benchmarked	1st simulation	Industry benchmarks applied to scoring
2	Calibrating	3+ outcome reports	User-specific sentiment/segment adjustments from historical accuracy data
3	Calibrated	10+ outcomes, >70% avg accuracy	Deeper per-segment, per-market weighting (in active development)

4.4 Lifecycle Calibration Loop (How it works in practice)

The Lifecycle Simulator (Engine 5) uses a market-bucket calibration system that tracks systematic prediction deltas per (country, category, dampening mode). Each completed loop improves the bucket's rolling stats and sharpens future predictions for the same market.

LIVE CALIBRATION LOOP

How echotest learns from your real outcomes

Loop 0 of 7

Click play to watch the calibration loop run

Sample Count (n)

Confidence

low

Penetration Delta

—

Suggested (free / aware)

—

Each calibration bucket is keyed by (country, category, dampening_mode). After 7 verified outcomes, the SA × streaming bucket reaches high confidence and recommends pre-filled values for every future simulation.

5. Validation: Prediction vs. Reality

When users submit real-world results through the Decision Intelligence system, the comparison engine measures prediction error against what actually happened and feeds it back into calibration. For each logged outcome it computes:

Sentiment accuracy — Predicted sentiment distribution vs. actual (mean absolute error)
Conversion accuracy — Predicted conversion rate vs. actual (relative error)
Segment delta analysis — Per-segment predicted vs. actual, identifying systematic biases
Country delta analysis — Per-country prediction accuracy
Bias detection — Systematic over/underestimates flagged across 3+ campaigns

Formal validation study — in progress

We are not publishing a headline accuracy number until it is earned. The protocol is deliberately strict: lock a prediction before launch, wait for the real campaign, then measure the gap on outcomes the model never saw. We will publish the results — including where we were wrong — as the holdout set grows.

What we measure

Sentiment & conversion error vs. real outcomes

How

Locked pre-launch prediction vs. unseen actuals

Status

Collecting design-partner outcomes now

6. Comparison to Traditional Research

echotest is not a replacement for all market research. It is a complement that excels at speed, breadth, and early-stage hypothesis testing. Here is an honest comparison:

Dimension	Traditional Research	Generic AI (ChatGPT)	echotest
Speed	2-6 weeks	Minutes (unstructured)	Minutes, not weeks
Cost per study	$10K-$100K	$0-$20	From $499/mo or $1,500/report
Market coverage	1-3 countries	Generic (no real data)	195 countries
Cultural depth	Basic demographics	None	6 Hofstede dims + WVS + 35 DNA fields
Sample size	500-2,000 real humans	1 model	50-2,000 structured archetypes
Repeatability	Low (different respondents)	Low (temperature variance)	High (deterministic archetypes)
Learns over time	No	No	Yes (Calibration Levels 0-3)
Statistical rigor	High (real data)	None	Moderate (Wilson intervals, reliability grades)
Ideal use case	Final validation	Quick brainstorming	Pre-launch stress testing

When to use echotest: Before you commit budget. Test 10 ideas in the time it takes to brief one focus group. Narrow your options, then validate the winner with traditional research if the stakes justify it.

7. Limitations & Honest Uncertainty

Scientific honesty is a core principle of echotest. We explicitly acknowledge the following limitations:

Synthetic, not real

Archetypes are statistical constructs, not real humans. They cannot capture individual lived experience, recent personal events, or genuine emotional states.

LLM-dependent behavior

Agent responses are generated by large language models. While culturally calibrated through persona prompts, they inherit LLM biases and limitations.

No causal claims

echotest identifies correlations and directional indicators. It does not establish causal relationships between content and outcomes.

Cultural dimensions are averages

Hofstede scores represent national averages. Individual variation within countries can be enormous. Sub-regional overrides partially address this for diverse nations (India, Nigeria, etc.).

Calibration requires feedback

The system only improves if users submit actual results. Without feedback loops, predictions remain at Level 0 (uncalibrated).

Every echotest report includes a disclaimer: “Results are directional indicators based on synthetic agent responses. Not a statistical prediction. Real-world outcomes may differ. Use results as one input among many in your decision-making process.”

8. Data Sources & Citations

Demographics

United Nations, Department of Economic and Social Affairs, Population Division (2024). World Population Prospects 2024.

Economics

World Bank. World Development Indicators (2024). GDP per capita, internet penetration, Gini index, literacy rates.

Culture

Hofstede, G. (2011). Dimensionalizing Cultures: The Hofstede Model in Context. Online Readings in Psychology and Culture, 2(1). Data: hofstede-insights.com

Values

Inglehart, R. et al. World Values Survey Wave 7 (2017-2022). worldvaluessurvey.org

Personality

McCrae, R.R. & Costa, P.T. (1992). An Introduction to the Five-Factor Model and Its Applications. Journal of Personality, 60(2), 175-215.

JTBD

Christensen, C.M. et al. (2016). Know Your Customers’ "Jobs to Be Done." Harvard Business Review.

Statistics

Wilson, E.B. (1927). Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association, 22(158), 209-212.

Platforms

DataReportal (2024). Digital 2024 Global Overview Report. Social media penetration and platform-specific demographics.

Press Freedom

Reporters Without Borders (2024). World Press Freedom Index.

Ready to stress-test your next launch?

See the methodology in action. Run your first simulation.

Start Stress-Testing