Retiring English prof pairs statistical analysis and language


Professor Vanden Bosch retires this May and will leave a legacy of using the power of language databases to research patterns. Photo by David Fitch.

English professor James Vanden Bosch will retire this May, having advocated the combination of math and language at Calvin for over a decade. The method, known as corpus linguistics allows scholars make statistically significant claims about language and to understand language variation over time.

Corpus linguistics compiles millions of words into large bodies of text, called corpora. Linguists analyze corpora for word frequency, collocations, and readability. Collocations are the usual combination of words in specific contexts. In a Google search, the word in the search bar results in several suggested phrases—these represent collocates because they are frequently paired.

Through corpus study, linguists define implicit language features and investigate the meaningful differences between texts.
“If you’ve got large data that turn out to be words, what you want to be able to do is have a 35,000 foot view of those words as well as the close-up reading,” Vanden Bosch explained.

Vanden Bosch is a self-taught corpus linguist. Now, he has the reputation of being the corpus evangelist on campus. He stumbled upon the field while performing groundbreaking research on the second amendment of the U.S. Constitution: “A well regulated Militia, being necessary to the security of a free state, the right of the people to keep and bear Arms shall not be infringed.”
The absolute phrase, “being necessary,” has changed its use since the 1700s. His research specializes on the historical usage of the absolute phrase to find the original writer’s intent. Vanden Bosch uses a corpus of Supreme Court opinions from the 1790s. Vanden Bosch will present this material in June 2018 at the University of Malta.

“Legal scholars have a deep desire to know this although almost none of them are working in this territory very consistently,” he said.
Vanden Bosch’s pursuit of corpus linguistics introduced him to Brigham-Young (BYU) professor Mark Davies. Davies runs the largest online corpora, including the Corpus of Contemporary American English (COCA) of over 560 million words. It includes five genres of text from 1990-2017: transcripts of spoken English, academic writing, newspapers, fiction and magazines.

Davies presented his corpora to Calvin faculty and students in the spring of 2017. The corpora are free and accessible to the public through the BYU website.

COCA users can search for a word, “evangelical,” for example, and find that it began being used in the 1810s and quadrupled its frequency in the 1990s, now being used to describe Christians at a statistically significant rate (6.51 MI). Today, researchers can see that the word has been redefined during certain spikes in usages in correlation with historical events.

He leaves behind a legacy of pairing statistical analysis and language for students like Brianna Busscher, a senior writing and biology major. Her research compares the linguistic patterns of scientific academic writing and popular science magazines.
“I think COCA makes linguistic analysis accessible,” she said. “You don’t have to be a grammar expert to use it.” She believes her corpus studies make her more self-aware as a scientific writer.

Vanden Bosch said that, while he is unsure of all that the future holds for him, he is considering a short-term teaching position in the Philippines.