Transparency

Exactly how this test works.

No black box. Here is the instrument, the scoring math, the norms, and the honest caveats – enough that a psychometrician could reproduce it.

The instrument

We use items from the International Personality Item Pool (IPIP) – a public-domain bank of validated items (Goldberg et al.) – in the IPIP-NEO-120 arrangement developed by John A. Johnson (2014), which measures the five domains and all thirty facets of the Five-Factor Model. The in-depth test administers all 120 items (four per facet). The quick test administers 60 (two per facet).

Responses use the standard IPIP 5-point accuracy scale (very inaccurate → very accurate). Each facet contains a balance of true-keyed and reverse-keyed items to limit acquiescence (yes-saying) bias; reverse-keyed items are flipped before scoring.

Scoring

  1. Reverse-keyed items are recoded (1↔5, 2↔4).
  2. Item responses are summed within each facet, and facets summed within each domain.
  3. Each raw score is compared to a normative mean and standard deviation to get a z-score, which becomes a percentile (assuming an approximately normal reference distribution).
  4. For the quick test, each facet’s 2-item score is projected onto the 4-item scale before norming – so both versions report on the same percentile metric, with the quick version carrying wider confidence bands.

Norms

Percentiles are computed against Johnson’s IPIP-NEO-120 normative data, stratified by sex and four age bands. If you tell us your sex and age band, we use the matching norm group; if you don’t, we use a pooled reference (the average across all eight groups), which we label on your report. As this site collects its own (anonymous) responses, we intend to publish McDonald’s ω and our own norms here once the sample is large and demographically broad enough to be trustworthy.

Confidence bands

No score is a point; it’s an estimate with error. We draw a 95% confidence band as ±1.96 × the standard error of measurement, where SEM = SD × √(1 − reliability). Pending our own ω, we use conservative reliabilities (≈ .85 for domains, ≈ .70 for facets, reduced by the Spearman-Brown formula for the shorter quick form). Conservative values mean slightly wider bands – we’d rather understate precision than overstate it.

Attention checks

Two items ask you to select a specific response (“please choose ‘very inaccurate’”). Missing both flags the result as low-quality and prints a warning on your report; these flags also let us exclude careless responses when computing norms.

Honest limits

This is self-report, which is subject to social-desirability bias and self-insight limits. Individual-level prediction from the Big Five is moderate. The pooled norm is an approximation. And percentiles assume normality, which is reasonable but imperfect at the extremes. We show you the bands so these limits are visible, not hidden.

Sources & licensing

IPIP items are public domain (attribution: Goldberg, L. R., et al., International Personality Item Pool, ipip.ori.org). The 120-item arrangement and norms follow Johnson (2014). See our about & sources page.

Selected sources

  1. Johnson, J. A. (2014). Measuring thirty facets of the Five Factor Model with a 120-item public-domain inventory: development of the IPIP-NEO-120. Journal of Research in Personality, 51, 78–89.
  2. Goldberg, L. R., et al. (2006). The International Personality Item Pool. Journal of Research in Personality, 40, 84–96.
  3. Maples-Keller, J. L., et al. (2019). Using IRT to develop a 60-item IPIP-NEO. Psychological Assessment.
  4. McDonald, R. P. (1999). Test Theory: A Unified Treatment.

This is a tool for self-understanding, not a clinical, diagnostic, hiring, or other high-stakes instrument. It does not diagnose any condition. Results describe where you fall relative to a reference sample – they are estimates with error, not verdicts. See our ethics & limits.