Dressing a Stone in Silk

The Ridiculous Posturing of Regression Techniques in Science

Sean McClure
3 min readNov 18, 2024

Stop telling people that sophisticated experts are working on hard problems when what they’re doing is using regression techniques. There is no universe where regression is sophisticated or realistic. It is a toy technique for toy theories used to convince naive laymen that causes have been found and sCiEnCe is happening.

All regression techniques make the convenient and unrealistic assumption that dependent variables are systematically influenced by independent variables.

Most studies use the childish form of regression, linear regression, where we pretend nature’s outputs have a straight-line relationship to countless influences that makeup a natural system. Worse, these influences are assumed to be independent; one of the most absurd premises one could possibly make about nature. Other assumptions regarding homoscedasticity and normality all speak to a conveniently static snapshot of a fictional representation that allows for things like “parametric tests” and “confidence intervals” which test how well data were artificially selected, and gift us with delusional levels of certainty.

Oh but “not so fast”they will scream. Regression is hardly limited to its linear underling. For there is nonlinear regression 🦸, where interactions and nonlinearities can be incorporated 💪.

Sure, by naive and brutally limited design. Nonlinear regression still relies on pre-specified functional forms (polynomials, exponential models, etc.). There is no such thing as seeing the functional form of nature, making such schemes laughable to any intellect with even a casual appreciation of nature’s complexity.

And do not tell me neural networks are a form of regression. What utter nonsense. There are no explicit assumptions about the functional form of relationships here. Such networks messily approximate any function, not in anything so naive as “form” but as an opaque blob of fantastically connected values. Here, one builds a better model by adding more reality (data) and architecture depth, not by crafting some miniature effigy of what one wishes nature were.

Networks do not seek handcrafted features as the number of variables grow. They don’t want your cute little interaction terms or designed transformations, as if you would know anything about what these might look like. As though your “domain knowledge” would teach something to the opaque blob of fantastically connected values. Networks handle high-dimensionality by default. The interactions and hierarchies are precipitated because you didn’t get in its way.

The regression kings don’t use small datasets because sufficient samples are hard to come by, they use small datasets because that’s the only place for regression to operate. One must create sterile, contrived little worlds in order to reveal “well understood relationships.” Throw even a modicum of realistic data at these models and they explode in instability and overfitting. Deep learning requires gross amounts of data because it can only work when something realistic is fed into its jaws.

And finally, stop with the interpretability nonsense. There is no such thing as iNtErPrEtAbIlItY when it comes to natural systems. There are no causal lines of determinism that can be teased out. Interpretability degrades as you approach complexity because there is none once you get there. A regression model facing reality is doubly pathetic, as it is both uninterpretable and useless.

Regression works really really well for one thing: modeling a pretend version of nature. A toy technique, for toy theories, used to convince the naive that sCiEnCe is happening.

Stop dressing a stone in silk.

--

--

Sean McClure
Sean McClure

Written by Sean McClure

Independent Scholar; Author of Discovered, Not Designed; Ph.D. Computational Chem; Builder of things; I study and write about science, philosophy, complexity.

Responses (4)