“Philosophy of science without history of science is empty; history of science without philosophy of science is blind.” — Imre Lakatos
Statistics isn’t just a collection of mathematical techniques—it’s a way of thinking about the world, addressing uncertainty, and drawing conclusions from incomplete information. As data scientists, machine learning engineers, and AI practitioners, we often apply statistical methods without reflecting on their theoretical foundations. Yet our work implicitly embodies philosophical stances about knowledge, evidence, and inference.
This series presents foundational readings that shed light on the philosophical aspects of statistics. They are not intended to turn data practitioners into philosophers, but to offer accessible ways to reflect on the assumptions that underlie our daily work.
The History and Philosophy of Statistical Practice
“Science and Statistics” by George E.P. Box
Box’s 1976 paper is a masterpiece that weaves reflections on the scientific method, experimental design, and the historical development of statistics through R.A. Fisher’s pioneering work at the Rothamsted Experimental Station. The paper examines the scientific method—particularly experimentation—as “a motivated iteration between theory and practice.”
Box highlights Fisher’s remarkable contributions at Rothamsted to demonstrate how statistical thinking evolves through practical scientific challenges. He explores Fisher’s “Studies in Crop Variation” series (I-VI), which details Fisher’s journey from analyzing agricultural data to developing a comprehensive theory of experimental design. Through these papers, we witness the emergence of analysis of variance, randomization principles, and factorial designs—all arising from Fisher’s confrontations with real agricultural challenges problems.
The paper recounts famous episodes from Fisher’s work, including the legendary tea-tasting experiment in which a woman claimed she could tell whether milk or tea was added first to a cup. Rather than dismissing this claim, Fisher designed an experiment to test it rigorously—illustrating how sound experimental principles can be applied even to seemingly trivial questions. Box shows how Fisher’s practical mindset led him to revolutionary insights: “Fisher was perplexed by the shapes of his fitted yield graphs. These showed a pattern of significant slow changes common to all the 13 Broadbalk plots… He speculates, ‘Of all the organic factors which influence the yield of wheat it is probable that weeds alone change sufficiently slowly to explain the changes at Broadbalk.’” This kind of practical reasoning, rooted in deep domain knowledge, exemplifies the scientific thinking Box advocates.
While “Science and Statistics” assumes some familiarity with statistical concepts, its philosophical insights are still accessible. Spanning around a dozen pages, it is filled with historical anecdotes and methodological reflections that illuminate the foundations of statistical thinking.
Perhaps the most compelling section comes near the end, where Box warns against two pathologies in statistical practice: “cookbookery” and “mathematistry.” Cookbookery refers to the mindless application of statistical techniques without considering their appropriateness or the true objectives of the investigation. Mathematistry pertains to mathematical developments lacking practical relevance—”theory for theory’s sake, which, since it seldom touches down with practice, has a tendency to redefine the problem rather than solve it.”
These warnings remain strikingly relevant today, when we often see machine learning techniques applied without carefully considering their suitability or theoretical frameworks developed with minimal connection to practical problems. Box reminds us that theory and practice must inform each other for genuine progress.
“Statistical Concepts in Philosophy of Science” by Patrick Suppes
Suppes’ paper provides a captivating historical and philosophical viewpoint on the evolution of statistical concepts and their central role in scientific inquiry. At its core is the acknowledgment that statistics originated from the practical necessity to manage variation in observations and measurement errors—challenges that scientists have faced since antiquity.
The paper begins by tracing this evolution from Ptolemy’s Almagest (150 CE) to the works of Laplace and Gauss. While early astronomers acknowledged the issue of observational errors, they lacked systematic quantitative methods to address them. Suppes cites Gauss’s pivotal 1821 work on the theory of least squares, where he differentiates between “irregular or random” errors (stemming from imperfections in our senses or instruments) and “constant or regular” errors (systematic biases that similarly affect all observations). This distinction remains fundamental to statistical thinking today.
What makes Suppes’ account particularly valuable is how it connects these historical developments to more profound philosophical questions about the nature of scientific knowledge. He demonstrates that statistics is not just a technical tool but a conceptual framework that influences how we understand evidence, confirmation, and scientific progress. The rise of formal statistical methods in the mid-19th century dramatically transformed how scientists evaluated their theories.
The paper also illustrates how statistical concepts play different roles across scientific disciplines. In physics, probability appears in theoretical descriptions of phenomena such as radioactive decay. In social sciences, statistical methods are crucial for experimental design, primarily through innovations like control groups—as Suppes memorably notes, “It is unimaginable that physicists would run experiments with ‘experimental’ electrons and ‘control’ ” electrons.”
Suppes’ thorough discussion of Chi-square tests as practical tools for evaluating scientific hypotheses is valuable for contemporary data practitioners. He walks through tests for stationarity, order determination, goodness-of-fit, and process homogeneity, demonstrating how these statistical procedures assist scientists in distinguishing genuine patterns from random fluctuations.
“The Lady Tasting Tea” by David Salsburg

Salsburg’s book makes the history of statistics come alive through engaging stories of the field’s pioneers. Unlike the papers by Box and Suppes, which require some statistical background, “The Lady Tasting Tea” assumes no prior knowledge, making it perfect for readers seeking a gentler introduction to statistical thinking.
The book takes its title from the same tea-tasting experiment referenced by Box, but Salsburg expands it into a full narrative that captures Fisher’s personality and scientific approach. Through biographical sketches, we meet the remarkable individuals who shaped modern statistics—not just Fisher but also William Sealy Gosset, Karl Pearson, Jerzy Neyman, and many others whose methods we still utilize daily.
Salsburg’s account is particularly valuable because it situates statistical developments within their historical contexts. We see how agricultural research at Rothamsted drove Fisher’s innovations in experimental design, how industrial quality control inspired Shewhart’s control charts, and how medical research influenced modern clinical trial methodologies.
“Thinking About Statistics: The Philosophical Foundations” by Jun Otsuka

If you read only one book on the philosophy of statistics, make it this one. Otsuka’s compact yet comprehensive treatment (under 190 pages) provides a uniquely integrated view of the major statistical frameworks that shape modern data science and AI.
What sets this book apart is its innovative organization. Unlike traditional treatments that begin with frequentist approaches, Otsuka starts with a primer on descriptive and inferential statistics before directly addressing Bayesian statistics. This deliberate sequencing reflects a conceptual rather than a historical approach to understanding statistical thinking. Classical frequentist statistics follow the Bayesian chapter, allowing readers to compare and contrast these frameworks more easily and effectively.
Otsuka’s focused approach to machine learning provides significant value to current practitioners by addressing traditional topics, such as model selection and overfitting, and modern methods like deep learning. This approach bridges the often distinct realms of traditional statistics and contemporary AI techniques, allowing readers to recognize their philosophical connections. The book subsequently delves into causal inference, an increasingly vital area at the intersection of statistics and AI.
The final chapter on the ontology, semantics, and epistemology of statistics ties everything together by addressing fundamental questions about what statistical models represent, how they generate meaning, and what kind of knowledge they provide. This philosophical synthesis reveals how various statistical approaches operate at different “ontological layers”:
- Data forms the most basic layer—what we directly observe.
- Probability models represent the latent structure generating data.
- Causal models capture relationships between possible worlds, enabling us to reason about interventions.
These layers require increasingly strong assumptions while providing greater explanatory power. Descriptive statistics remains within the realm of data, while inferential statistics ties data to probability models, and causal inference connects probability models to causal structures. Otsuka also frames the Bayesian-frequentist debate in epistemological terms: Bayesian statistics parallels internalist epistemology (focused on coherence among beliefs), while classical statistics resembles externalist epistemology (concerned with the reliability of processes). Each approach faces its challenges in ensuring that its form of justification leads to truth.
Most provocatively, Otsuka explores how modern machine learning may transform statistical epistemology. While traditional statistics adheres to a foundationalist ideal—drawing conclusions from explicit theories and principles—deep learning models often lack cohesive theoretical foundations. Instead, they are validated through their demonstrated performance, similar to how virtue epistemology bases knowledge on the abilities of epistemic agents rather than on universal theories. This raises profound questions about the nature of scientific knowledge in an age of AI-driven discovery.
Although the book requires some mathematical background, its philosophical insights make it invaluable for anyone looking to comprehend not just how statistical methods work, but what they mean and what assumptions they embody about knowledge and reality.
To conclude this exploration of philosophical readings in statistics, it’s worth reflecting on how these perspectives enrich our practice as data scientists and statisticians. Box’s warnings against “cookbookery” and “mathematistry,” Suppes’ emphasis on the historical context of measurement error, Salsburg’s engaging portraits of statistical pioneers, and Otsuka’s profound philosophical framework all remind us that statistics is far more than merely applying formulas to data. These readings invite us to consider the ontological assumptions we hold about what exists in the world, the semantic interpretations that give meaning to our models, and the epistemological foundations that justify our conclusions. By delving into the philosophical dimensions of statistical thinking, we not only become more reflective practitioners but also develop a deeper appreciation for statistics as a means of understanding an uncertain world. Whether you’re a seasoned statistician or a newcomer to data science, these readings offer valuable insights into the rich intellectual tradition that shapes how we reason with data.

Leave a Reply