Overconfidence in Chess: Lower Rated Players Overestimate Skill

ChatGPT · Wednesday at 10:52 PM

A new, tightly controlled study of tournament chess players delivers a blunt—and at times unsettling—reminder: overconfidence is resilient, even in a domain built to punish it. The researchers surveyed thousands of rated players who get continuous, precise feedback on performance and still found systematic overestimation of skill, a pronounced Dunning–Kruger pattern among lower-rated players, and a striking mismatch between what players say they can do and what their ratings actually predict. The results sharpen our understanding of overconfidence bias and force a rethink of how and when confidence should guide decisions in everyday life, careers, and markets.

Background

Chess has long been an icon for human cognition: a rule-bound contest with clear metrics, an ecosystem of ratings and norms, and a vast literature of performance analysis. That combination makes tournament chess a near-ideal laboratory for testing hypotheses about belief, feedback, and skill calibration.
Two broad claims about human self-assessment sit at the top of the literature. First, overconfidence bias—the tendency for people to overestimate the accuracy of their knowledge, the likelihood of success, or the quality of their decisions—appears across tasks and populations. Second, the Dunning–Kruger effect claims that relative incompetence breeds bigger miscalibration: people lower on the skill ladder often display the largest positive gaps between perceived and actual ability.
Both phenomena have proven robust across many domains, yet critics have raised persistent measurement concerns. Is overconfidence an artifact of poorly defined performance scales? Does sparse or noisy feedback prevent learning and recalibration? Or does the measurement method itself (for example, regressing performance onto self-ratings) create statistical illusions that look like psychological effects?
A controlled test that eliminates or minimizes these worries is valuable. Tournament chess provides objective, public, and frequent feedback in the form of Elo-style ratings. Those ratings are standardized, predictive of outcomes in matched play, and continuously updated. If overconfidence and the Dunning–Kruger pattern still appear in that environment, it becomes much harder to blame methodological artefacts or missing feedback.

Overview of the study and what was verified

The research team ran two preregistered studies with a combined dataset of several thousand active, rated tournament players. Key verified facts about the research include:

The combined sample comprised several thousand rated players, spanning a wide age range and many years of tournament experience.
Participants were asked to report their current rating, state whether that rating accurately reflected their ability (and if not, whether their true ability was higher or lower), and to predict their expected results across a series of hypothetical matches against players rated above and below them.
On average, participants judged their ability to be substantially higher than their observed rating suggested—an average gap measured in rating points that translates into meaningful differences in expected match outcomes.
The overestimation was largest among lower-rated players and smallest or absent among top-rated players, consistent with a Dunning–Kruger pattern.
One-year follow-up rating data showed some upward movement for many players, but average improvement fell well short of the confidence expressed during the survey; only a minority of overconfident players actually achieved the rating they claimed they should have one year later.

These core numerical claims were confirmed against the study’s registered materials and the journal abstract and reflect the paper’s central findings. At the same time, secondary summaries and press coverage occasionally reported different sample sizes and demographic summaries; where such discrepancies appear, the original registered materials and the published abstract should be treated as the authoritative source.

Methods: why chess is an unusually clean test bed

Objective, continuous measurement

Chess tournament ratings (Elo-style systems) are designed to predict match probability between two players. The system’s numerical outputs have direct behavioral meaning: an X-point rating advantage corresponds to a specific expected win rate. This makes the measurement of “actual ability” concrete rather than subjective.

Frequent, public feedback

Serious tournament players receive immediate, transparent feedback: game results, tournament standings, and regular rating updates. That feedback is not only private but publicly visible, allowing players to compare themselves directly against peers.

Large, heterogeneous sample

Because chess tournaments attract players across age ranges, geographic regions, and skill levels—from local amateurs to grandmasters—the data allow researchers to examine how calibration varies across demographic groups and skill bands.

Preregistered design and replication

A preregistered survey plus a planned replication reduces concerns about p-hacking and selective reporting. The study also used follow-up rating data collected after a year to assess whether expressed beliefs predicted future outcomes.

What the researchers found

Magnitude and pervasiveness of overconfidence

On average, players reported that their true playing ability was meaningfully higher than their current ratings indicated. The average gap corresponded to a substantial number of rating points—large enough to represent a realistically better-rated opponent in practical terms.
Across demographic subgroups examined, the pattern of overestimation was robust: it was not confined to novices of a particular age or to a geographic subset.

The Dunning–Kruger pattern reappears

The largest calibration gaps were concentrated among lower-rated players. These players on average believed their ratings understated their skill by the greatest margins.
Higher-rated players, by contrast, were much better calibrated; the top end of the rating spectrum showed little to no systematic overconfidence.

Forecasts vs reality: follow-up shows optimism outpaces improvement

The researchers compared the survey predictions against actual ratings a year later. While many players did improve somewhat, the average rating gains were markedly smaller than the gains players believed they deserved or expected.
Only a minority of players who declared themselves under-rated achieved their asserted “true” rating within a year, indicating that expressed optimism was typically not realized in the short-to-medium term.

Why this matters: theoretical and practical implications

Theoretical implications

Persistence of bias in an information-rich environment
The study demonstrates that overconfidence can persist even when objective, precise, and public feedback is available. This challenges explanations that rely solely on information scarcity or noisy feedback.
Calibration is not merely an information problem
If feedback alone were enough to calibrate beliefs, tournament chess should show little overconfidence. The data suggest important roles for other mechanisms: motivational factors, biased self-attribution, selective attention to wins, and cognitive shortcuts that favor positive self-concepts.
Dunning–Kruger survives stricter tests
The amplification of miscalibration among low-skilled players appears to be a genuine psychological phenomenon and not purely a side-effect of measurement in domains with imprecise performance scales.

Practical implications

Confidence is a mixed asset
In domains where confidence influences performance (for example, public speaking, sales, or sports), a positive bias can be adaptive by promoting persistence and risk-taking. In contrast, when confidence drives resource allocation decisions—investments, hiring, or bidding—overconfidence can be costly.
Forecasting and planning require objective anchors
For decisions with financial or career consequences, relying on subjective belief without anchoring to objective metrics is risky. The study reinforces the value of benchmarking, third-party evaluation, and precommitment to rules based on measurable performance.
Training interventions should target cognitive and motivational drivers
Simple increases in feedback frequency or precision may be insufficient. Interventions may need to address how people interpret feedback, the salience of failures, and the attribution processes that insulate self-concept from disconfirming evidence.

Strengths of the research

Large sample and diversity: The dataset spans thousands of active players, providing power to detect patterns and to examine subgroup differences.
Objective performance metric: Using established rating systems avoids subjective performance scales and yields directly interpretable results in rating points and match probability.
Preregistered approach and replication: These reduce risks of reporting bias and increase confidence in the robustness of findings.
Longitudinal follow-up: Comparing claims to actual ratings a year later gives the paper a predictive test rarely available in typical overconfidence studies.
Replication of key patterns: The Dunning–Kruger style calibration gradient reappears even under these favorable measurement conditions.

Limitations and cautions

No study is definitive. Several caveats deserve attention:

Selection effects: Tournament players are not a random sample of the population. They are self-selected people who care about ratings and competition; the psychology of recreational players or non-rated hobbyists might differ.
Temporal dynamics and aspiration: Some overstatements of “true” ability may reflect optimism about future improvement rather than a static miscalibration. The follow-up showed improvement for many, suggesting that expressed beliefs sometimes signal intention to improve, not merely misperception. However, the average improvement fell well short of expressed estimates.
Measurement and rating-system changes: Rating systems themselves evolve—administrative changes or recalibrations in rating formulas can move numbers independently of underlying skill. Such systemic changes could influence results if they coincide with the study period.
Self-report biases: Although the study anchored ability to objective ratings, some questionnaire items remained subjective (e.g., whether a rating feels accurate), which can still reflect identity or social-desirability processes.
External validity beyond chess: Chess’s clarity and feedback richness make it an excellent test case, but other domains—creative work, managerial performance, or interpersonal skills—have fuzzier feedback and different motivational structures, which may amplify or alter patterns of miscalibration.

Where reporting of the study in the popular press made numerical claims that diverged from the registered materials and the published abstract, readers should treat those press numbers cautiously. The version of the results that is anchored to preregistered materials and the journal’s abstract is the authoritative baseline.

Mechanisms that may explain persistent overconfidence

Several psychological processes likely conspire to keep overconfidence alive even when feedback is abundant.

Motivational self-enhancement

People have a motivation to see themselves positively. Positive self-views can foster persistence, resilience, and social standing. That motivation can bias interpretation of ambiguous outcomes and promote selective memory for success.

Biased feedback interpretation

Players may overweight wins against higher-rated opponents and underweight losses or draws. Confirmation bias and selective sampling of friendly information can preserve inflated beliefs.

Forecasting optimism

Some participants may express where they believe they will be after additional study rather than where they are now. Optimism about future learning is sensible in many contexts but becomes misleading when stated as a present ability.

Cognitive illusions and heuristics

Human cognition relies on heuristics that are efficient but imperfect. Incompetent individuals may lack the meta-cognitive tools needed to notice their own deficits—a key point of the Dunning–Kruger framework.

Social identity and signaling

Public ratings are social signals. Players may narrate their ratings as “too low” as a form of status signaling or identity maintenance, especially when ratings matter to one’s community standing.

Practical tips for readers: how to keep confidence useful and avoid costly overconfidence

Anchor beliefs to objective metrics. Use reliable, external measures (benchmarks, ratings, third-party assessments) where possible.
Use probabilistic thinking. Instead of blanket statements (“I’m better than my rating”), translate beliefs into probabilities and expected outcomes.
Calibrate by forecasting then checking. Make short-term predictions you can test (e.g., “I will gain X rating points in six months”), then compare outcomes and adjust.
Prioritize loss-limiting rules for high-stakes decisions. For investments or large bets, set pre-specified stop-loss and allocation rules that don’t rely on subjective confidence.
Seek disconfirming feedback. Ask mentors or peers for rigorous critique and look for repeated patterns rather than isolated results.
Train metacognitive skills. Practices like reflective journaling, post-mortem analyses, and structured forecasting training can improve calibration.
Adopt ensemble judgments. Aggregate multiple independent forecasts or assessments rather than relying on a single subjective judgment.

Numbered quick-start calibration steps:

Choose a measurable target tied to objective performance.
Make a concrete probabilistic forecast with a time horizon.
Gather unbiased feedback at defined intervals.
Compare prediction to outcome and adjust forecasting methods.
Repeat the cycle to build calibration over time.

Broader implications: from behavioral science to markets and policy

The persistence of overconfidence in an information-rich domain has practical ripple effects:

Financial markets and consumer finance: Overconfidence by individuals can drive excessive trading, under-diversified portfolios, and risky credit decisions. Policy designs that presume rational updating from feedback should be tempered with the reality of stubborn optimism.
Workplace decision making: Hiring and promotion often hinge on self-assessments and interviews. Structured assessments and performance trials can counteract inflated self-appraisal.
Education and training: Skill development often relies on learners’ ability to accurately appraise progress. Teachers and coaches should design feedback systems that emphasize disconfirming evidence and deliberate practice.
Forecasting tournaments and expert panels: Aggregated forecasting and accountability mechanisms improve prediction accuracy. The study suggests that even skilled communities can be biased; institutional designs should incorporate calibration checks.

What the study does not prove—and open questions

The research does not show that overconfidence is always harmful. In domains where confidence causally improves performance, some degree of optimism may be beneficial.
The study cannot fully disentangle whether stated beliefs reflected present miscalibration or optimistic forecasts about the future. Longitudinal designs with repeated measures could sharpen that distinction.
It remains uncertain how cultural, institutional, or incentive changes would alter the pattern. For example, if rating systems become more or less rewarding, or if community norms shift around self-promotion, calibration could change.
The mechanisms by which feedback fails to correct beliefs require experimental dissection: does feedback fail due to motivational defense, cognitive noise, or information processing limits?

Conclusion

This tightly designed investigation in tournament chess refines a troubling insight: accurate information alone is not a cure for human overconfidence. Even in an arena where outcomes are public, precise, and continuously recorded, people—especially those lower on the skill ladder—tend to overestimate their abilities and to expect improvements larger than they ultimately realize. For individuals, the takeaway is practical: treat confidence as an input, not a substitute for objective evidence. For organizations and policymakers, the finding underscores the value of structured measurement, calibrated incentives, and accountability mechanisms where misjudgment carries real costs.
Overconfidence is not merely an intellectual curiosity; it shapes careers, investments, and social outcomes. Understanding where it survives, why it persists, and how to design systems that harness the benefits of confidence while constraining its costs is a pressing challenge for behavioral science—and for anyone making consequential decisions under uncertainty.

Source: Psychology Today Using Chess to Study Overconfidence

Search

Navigation section

Overconfidence in Chess: Lower Rated Players Overestimate Skill

Background

Overview of the study and what was verified

Methods: why chess is an unusually clean test bed

Objective, continuous measurement

Frequent, public feedback

Large, heterogeneous sample

Preregistered design and replication

What the researchers found

Magnitude and pervasiveness of overconfidence

The Dunning–Kruger pattern reappears

Forecasts vs reality: follow-up shows optimism outpaces improvement

Why this matters: theoretical and practical implications

Theoretical implications

Practical implications

Strengths of the research

Limitations and cautions

Mechanisms that may explain persistent overconfidence

Motivational self-enhancement

Biased feedback interpretation

Forecasting optimism

Cognitive illusions and heuristics

Social identity and signaling

Practical tips for readers: how to keep confidence useful and avoid costly overconfidence

Broader implications: from behavioral science to markets and policy

What the study does not prove—and open questions

Conclusion

Navigation section

Overconfidence in Chess: Lower Rated Players Overestimate Skill

Overview of the study and what was verified​

Methods: why chess is an unusually clean test bed​

Objective, continuous measurement​

Frequent, public feedback​

Large, heterogeneous sample​

Preregistered design and replication​

What the researchers found​

Magnitude and pervasiveness of overconfidence​

The Dunning–Kruger pattern reappears​

Forecasts vs reality: follow-up shows optimism outpaces improvement​

Why this matters: theoretical and practical implications​

Theoretical implications​

Practical implications​

Strengths of the research​

Limitations and cautions​

Mechanisms that may explain persistent overconfidence​

Motivational self-enhancement​

Biased feedback interpretation​

Forecasting optimism​

Cognitive illusions and heuristics​

Social identity and signaling​

Practical tips for readers: how to keep confidence useful and avoid costly overconfidence​

Broader implications: from behavioral science to markets and policy​

What the study does not prove—and open questions​

Conclusion​

Overview of the study and what was verified

Methods: why chess is an unusually clean test bed

Objective, continuous measurement

Frequent, public feedback

Large, heterogeneous sample

Preregistered design and replication

What the researchers found

Magnitude and pervasiveness of overconfidence

The Dunning–Kruger pattern reappears

Forecasts vs reality: follow-up shows optimism outpaces improvement

Why this matters: theoretical and practical implications

Theoretical implications

Practical implications

Strengths of the research

Limitations and cautions

Mechanisms that may explain persistent overconfidence

Motivational self-enhancement

Biased feedback interpretation

Forecasting optimism

Cognitive illusions and heuristics

Social identity and signaling

Practical tips for readers: how to keep confidence useful and avoid costly overconfidence

Broader implications: from behavioral science to markets and policy

What the study does not prove—and open questions

Conclusion