Norvig on Chomsky

On Chomsky and the Two Cultures of Statistical Learning wherein Norvig responds to Chomsky's remarks in response to Steven Pinker's question about the success of probabilistic models trained with statistical methods. chomsky norvig


Chomsky said words to the effect that statistical language models have had some limited success in some application areas.

Let's look at computer systems that deal with language, and at the notion of "success" defined by "making accurate predictions about the world." First, the major application areas: Search engines ... Speech recognition ... Machine translation ... Question answering ...

Now let's look at some components that are of interest only to the computational linguist, not to the end user: Word sense disambiguation ... Coreference resolution ... Part of speech tagging ...

Clearly, it is inaccurate to say that statistical models (and probabilistic models) have achieved limited success; rather they have achieved a dominant (although not exclusive) position.

When Chomsky said "That's a notion of [scientific] success that's very novel. I don't know of anything like it in the history of science" he apparently meant that the notion of success of "accurately modeling the world" is novel, and that the only true measure of success in the history of science is "providing insight" — of answering why things are the way they are, not just describing how they are.

Another part of Chomsky's objection is "we cannot seriously propose that a child learns the values of 10^9 parameters in a childhood lasting only 10^8 seconds." (Note that modern models are much larger than the 10^9 parameters that were contemplated in the 1960s.) But of course nobody is proposing that these parameters are learned one-by-one.

And yes, it seems clear that an adult speaker of English does know billions of language facts (for example, that one says "big game" rather than "large game" when talking about an important football game). These facts must somehow be encoded in the brain.

Thus it seems that grammaticality is not a categorical, deterministic judgment but rather an inherently probabilistic one. This becomes clear to anyone who spends time making observations of a corpus of actual sentences, but can remain unknown to those who think that the object of study is their own set of intuitions about grammaticality.

It seems that the algorithmic modeling culture is what Chomsky is objecting to most vigorously. It is not just that the models are statistical (or probabilistic), it is that they produce a form that, while accurately modeling reality, is not easily interpretable by humans, and makes no claim to correspond to the generative process used by nature.

Chomsky dislikes statistical models is that they tend to make linguistics an empirical science (a science about how people actually use language) rather than a mathematical science (an investigation of the mathematical properties of models of formal language).


See Statistical Modeling Cultures whose definitions Norvig relies on heavily in the least defensive part of his reply.

See One Hour Wikipedia where I make model building an exploratory activity.

See Personatron for a hint of my unpublished work based on emergent model building.