BLOG PAGES

Saturday, December 18, 2010

The Cultural Genome

Google Books Ngram Viewer: prevalence of the word 'apocalypse' from 1500 to 2000.

More weird Google Books news has come to light.  The Chronicle for Higher Education and the Harvard Gazette are reporting that Google Books and Harvard researchers are using computer algorithms to assess the rise and fall of certain words and ideas in our culture by crunching through all the words in 5.2 million digitized books, originally published between 1500 and 2008.  This sample represents roughly 4 per cent of all the books ever published.  The research leaders describe the prevalence of words over time as a cultural "fossil record."  Their comments are littered with weird neologisms lifted from economics and the sciences.  They say they are searching for the "cultural genome" using "culturnomics." You can test their work by typing in different words into a Google Books Ngram Viewer engine here - it will show you the frequency with which these words were used over time; but it doesn't indicate whether the meaning of the word changed. I typed in the word 'apocalypse' to check its use from 1500 to 2000. You can see the results in the image above.

From the Harvard report:
Researchers have been tracking the frequency with which words appear in books, allowing scholars the ability to more precisely quantify a wide variety of cultural and historical trends. Leading the four-year effort are Harvard's Jean-Baptiste Michel ... a postdoctoral researcher in the Department of Psychology and Program for Evolutionary Dynamics, and Erez Lieberman Aiden, a junior fellow in Harvard’s Society of Fellows. ... About 72 percent of [the whole] text is in English, with smaller amounts in French, Spanish, German, Chinese, and Russian. It is the largest data release in the history of the humanities, the authors note, a sequence of letters 1,000 times longer than the human genome. If written in a straight line, it would reach to the moon and back 10 times over.
The science metaphors are heavy going, and the results are being questioned by critics.  But they're still interesting. From the Chronicle report, some of the team's findings:
The English lexicon grew by 70 percent from 1950 to 2000, with roughly 8,500 new words entering the language each year. Dictionaries don't reflect a lot of those words. "We estimated that 52 percent of the English lexicon—the majority of the words used in English books—consists of lexical 'dark matter' undocumented in standard references," the authors write.

Researchers tracked references to individual years to demonstrate how humanity is forgetting its past more quickly. Take "1880": It took 32 years, until 1912, for references to that year to fall by half. But references to "1973" fell by half within 10 years.

Compared with their 19th-century counterparts, modern celebrities are younger and more well known—but their time in the limelight is shorter. Celebrities born in 1800 initially achieved fame at an average age of 43, compared with 29 for celebrities born in 1950.
Research published online on December 16 in the journal Science, here (Science DOI: 10.1126/science.1199644).

Click for my earlier post on cultural analysis and Googlebooks.

2 comments:

  1. Too much information, and too much of it disposable. This does not take into account deliberate destruction of infomation, of course...Last year, Amazon used a back door in its e-reader to remotely delete thousands of copies of 1984, by George Orwell.

    ReplyDelete
  2. Thanks for the comment, Jay. It's hard to know what to make of these cryptic details.

    ReplyDelete