An honest take

MBTI, Enneagram, DISC & Color Code.

They’re fun, they’re everywhere, and they’re weaker science than their popularity suggests. Here’s the candid version – including the real things each one does well, and how we borrow those honestly.

Our objection isn’t snobbery. It’s that typing – sorting people into discrete boxes – throws away information and reclassifies people on retest. Traits vary continuously; forcing a line down the middle turns a near-tie into a confident, sticky label.

SystemStructureReliabilityValidity
Big Five5 continuous traitsTest-retest .80–.90Strong, cross-cultural, predictive
MBTI16 types (4 dichotomies)39–76% switch type on retestWeak for outcomes (r ≈ .10–.20)
Enneagram9 typesMixedNo replicated 9-factor structure
DISC4 ipsative stylesAcceptable internal consistencyDimensions not independent; reduce to Big Five
Color Code4 “core motives”Acceptable retestLow construct validity vs. Big Five

MBTI (Myers-Briggs)

Built in the 1940s on Jung’s Psychological Types, the MBTI gives four-letter types across E/I, S/N, T/F, and J/P. The problems are well documented: between 39% and 76% of people get a different type on retest within five weeks, the dimensions are normally distributed (so there’s no natural place to cut a “type”), and predictive validity for job performance is weak. Defenders note that the underlying continuous scores are reasonably reliable – which is precisely the point: the scores are fine; the type assignment is what fails.

What it does well, honestly borrowed: a shared vocabulary, story-shaped feedback, and motivating identity framing. And its four axes map cleanly onto the Big Five – which is why your result page offers an MBTI translation of your dimensional scores, caveats attached.

The Enneagram

Nine types with roots in spiritual traditions, popularized in the 20th century. A review of 104 samples (Hook et al., 2021) found “mixed evidence of reliability and validity” – some Big Five overlap, but no consistent factor-analytic support for the nine-type structure and no peer-reviewed criterion validity. Genuine value: rich self-reflection prompts and a language for talking about motivation – useful for insight, not for classification.

DISC

From Marston’s 1920s work (never validated by Marston himself). Independent analysis (e.g., Martinussen et al., 2003) found DISC’s dimensions aren’t psychometrically independent and are better explained as combinations of Big Five traits. Most validation comes from vendors, not independent peer review. Genuine value: a simple, memorable workplace-communication vocabulary.

Color Code

Four “driving core motives” – Red (power), Blue (intimacy), White (peace), Yellow (fun). The first peer-reviewed study (Ault & Barney, 2007) found acceptable retest reliability but low construct validity against established measures, cautioning against individual classification. Genuine value: the motive framing (“why you do what you do”) overlaps with real Big Five motivational facets like Achievement-Striving and Excitement-Seeking.

Our approach

We keep what these systems do well – shared language, narrative, motivation – and drop what they do badly – discrete typing of continuous traits. Your report speaks in dimensions and percentiles, then offers type “bridges” as clearly-labelled translations, never as the measurement itself.

Selected sources

  1. Pittenger, D. J. (2005). Cautionary comments regarding the MBTI. Consulting Psychology Journal, 57(3), 210–221.
  2. Hook, J. N., et al. (2021). Reliability and validity of the Enneagram: a systematic review. Journal of Clinical Psychology.
  3. Martinussen, M., Richardsen, A. M., & Vårum, H. W. (2003). Validation of an ipsative personality measure (DISCUS). Scandinavian Journal of Psychology, 42(5), 411–416.
  4. Ault, R. L., & Barney, S. T. (2007). Construct validity of the Hartman Color Code. International Journal of Selection and Assessment.

This is a tool for self-understanding, not a clinical, diagnostic, hiring, or other high-stakes instrument. It does not diagnose any condition. Results describe where you fall relative to a reference sample – they are estimates with error, not verdicts. See our ethics & limits.