Calculating How Much Data a Child Sees, and Other Reductionist Nonsense Spouted by AI Gurus

Sean McClure
3 min readDec 4, 2024

--

Yan Lecun said this:

…to which I responded with:

…to which Yan then responded with:

…to which I responded with:

Nature would not use “metrics” in the form we have invented them; as strict quantitative measures used to evaluate various aspects of a system’s performance, and/or as a kind of “distance” between elements of a set.

Such constrained evaluation only makes sense when framing a system in terms of low-dimensional constructs (the “cogs” and “pistons” we use to make machines). Nature most surely is not comprised of “sets” and “distances” to drive its behavior.

“Performance” in nature is one of survival, not one of attending to precise definitions.

Take some of your statements:

“The data bandwidth of visual perception is roughly 16 million times higher than the data bandwidth of written (or spoken) language.”

There is little meaning to this statement beyond forcing our definition of visual perception into some low-dimensional convenient framework for discussing (pretending) how things “work.” Visual perception is the ability to interpret the surrounding environment through vision. What does this interpretation have to do with bandwidth? I’m not saying nothing, but in order to say “16 million times higher” you would need a solid conceptual connection between perception and bandwidth; no such connection exists in science.

You are introducing a premise that the rate at which data can be transmitted from one location to another within a given time frame allows us to measure visual perception. No, it does not. Even the notion that we can know how much data impinges on our senses is highly dubious, let alone making a comparison to the “bandwidth” of written/spoken language, whatever that’s supposed to mean. Manmade “tokens” are not things of nature, they are invented demarcations of an otherwise deeply interconnected natural phenomenon (i.e. language); demarcations that conveniently fit how we instruct programs to carry out tasks.

And this doesn’t even touch the real end goal of any such statements; to assess the amount and/or quality of information received in such processes. Where in your metrics does the amount and/or quality of information come from? Shannon? His entire framework is based on machine-to-machine transactions, not the interpretation, meaning or even use of such information. Whatever humans are doing, in conjunction with the brain/mind, it most surely isn’t merely transactional.

“In a mere 4 years, a child has seen 50 times more data than the biggest LLMs trained on all the text publicly available on the internet.”

Are we supposed to take this as a scientifically accurate statement? I am not saying humans don’t take in more information (undoubtedly they do) but “50 time more”…where did this number come from? A calculation that has little to do with nature no doubt.

The problem with these kinds of statements is not their conclusions, which I don’t necessarily disagree with, but their premises, which are all based on the implicit assumption that we can reverse engineer nature and understand its behavior using precisely-defined metrics. As though defining “bandwidths” and “tokens” has anything to do with what emerges. It doesn’t make for a good argument because it’s not how complexity and nature work.

Just because something sounds more scientific doesn’t mean it is.

…to which Yan has yet to respond to. And as Gurus go, we can expect him not to.

Next.

--

--

Sean McClure
Sean McClure

Written by Sean McClure

Independent Scholar; Author of Discovered, Not Designed; Ph.D. Computational Chem; Builder of things; I study and write about science, philosophy, complexity.

Responses (3)