Language, computers & statistics: not just for geeks corpus linguistics in the a level classroom

Exercise 2: exploring corpus linguistics with Sketch Engine

Glossary of corpus linguistics terminology used here

Exercise 2: exploring corpus linguistics with Sketch Engine (


Task explanation:


Once you have the dashboard page of Sketch Engine, you can start typing the name of the corpus you want to work with in the box at the top (next to the title “Dashboard”) of the page. Select the BNC again. This is the same corpus as we have been using with CQPWeb.

Word lists and frequencies of word occurrences in corpora
Select the “Word List” function and in the menu that follows keep the settings as they appear (“Word” and “All”). Sketch Engine will calculate a list of the most frequent words in the BNC. Make a note of the top 10 or so – what do you notice (e.g. type of words? frequencies?)
Sketch Engine allows you to make a frequency list of particular words – go back to the dashboard and select “Word List”, but now, select “Noun” and “All” as the search terms. This will give you a word frequency list of only nouns. What do you notice about this frequency list?


keyword lists, keywords and keyness:
Frequency lists are often good starting points for investigating a corpus. Another typical starting point is a keyword list. THIS IS NOT TO BE CONFUSED WITH “KEY WORD IN CONTEXT” (KWIC), which is just the node in a concordance line. Keywords are words that appear more frequently in a corpus than you would expect from chance – to calculate keyword lists you compare the corpus you are investigating (focus corpus) with a (large) general corpus (reference corpus).
Keywords often include proper nouns and subject specific lexis (especially when the corpus consists of texts from a particular field only). But often keywords can give insight into what a particular discourse (as represented by the corpus you are investigating) is about.

Select the “Keywords” function from the dashboard and leave the settings as they are and click on “I know what I’m doing. Go!” – it will take a little time for the software to calculate the keywords. Sketch Engine will give you one word keywords and multi-word keywords. For the purpose of the exercise, we’ll focus on one word keywords (it takes longer to calculate multi-word keywords).

Make a note of the top 10 common one word keywords – what do you notice? Can you explain this?

If you click on the “eye” icon at the top right of the results page for keywords, you can select to see the scores, by ticking the box for scores. Having the scores makes it easier to get a sense of the strength of “keyness” and compare different keywords as to how “key” they are.


