Many researchers have great war stories to tell about the perilous waters between correlation and causation. Here is my personal favorite:
In the late 90’s, I was working with neurosurgery patients in a medical psychology clinic in a hospital. We gave each of the patients a battery of cognitive tests before their surgery and then administered the same battery 6 months after the surgery. Our goal was to check for cognitive changes that may have resulted from the surgery. One researcher from outside the clinic focused on our strongest finding: a significant reduction of anxiety from pre-op to post-op. She hypothesized that this dramatic finding was evidence that the neural basis for anxiety was affected by the surgery. Had she only taken a minute to explain her hypothesis in plain terms to a layperson, especially one that could imagine the anxiety a patient could potentially experience hours before brain surgery, she surely would have withdrawn her request for our data and slipped quietly out of our clinic.
“Correlation does not imply causation” is a research catchphrase that is drilled into practitioners from internhood and intro classes onward. It is particularly true when working with language, because all linguistic behavior is highly patterned behavior. Researchers from many other disciplines would kill to have chi square tests as strong as linguists’ chi squares. In fact, linguists have to reach deeper into their statistical toolkits, because the significance levels alone can be misleading or inadequate.
People who use language but don’t study linguistics usually aren’t aware of the degree of patterning that underlies the communication process. Language learning has statistical underpinnings, and language use has statistical underpinnings. It is because of this patterning that linguistic machine learning is possible. But, linguistic patterning is a double edged sword- potentially helpful in programming and harmful in analysis. Correlations abound, and they’re mostly real correlations, although, statistically speaking, some will be products of peculiarities in a dataset. But outside of any context or theory, these findings are meaningless. They don’t speak to the underlying relationship between the variables in any way.
A word of caution to researchers whose work centers around the discovery of correlations. Be careful with your findings. You may have found evidence that shows that a correlation may exist. But that is all you have found. Take your next steps carefully. First, step back and think about your work in layman’s terms. What did you find, and is that really anything meaningful? If your findings still show some prospects, double down further and dig deeper. Try to get some better idea of what is happening. Get some context.
Because a correlation alone is no gold nugget. You may think you’ve found some fashion, but your emperor could very well still be naked.