Language, computers & statistics: not just for geeks corpus linguistics in the a level classroom

Dispersion – comparing CQPWeb and Sketch Engine

In exercise 1f you looked at dispersion plots for “menstruating” and “raping”, as collocates of ‘woman’ respectively. If you create a concordance for both words, you can create a dispersion plot by selecting the bar chart symbol at the top right of the concordance. How does Sketch Engine’s dispersion graph compare to CQPWeb’s one? Any preference?

Exercise 3: using the JSI web corpus 2014-19 English on Sketch Engine:


The Timestamped JSI web corpus 2014-2019 English is a very large corpus which is still being built automatically. It consists of English language news reports, sources via their RSS feeds. Thus, it is compiled of a very specific type of discourse AND it is cross-national, unlike the BNC, which is a general corpus of British English only. Return to the dashboard and select this corpus.
Click on information about the corpus – how big is this corpus? (how many tokens? how many words?)

Now run the calculation for Keywords for this corpus. It will take several minutes (because of the size of the corpus). Make a note of the top 10 keywords.
What type of words dominate this list? Why do you think this might be the case?
Keywords are often said to indicate the “aboutness” of a discourse. Does this keyword list give you sense of what the news have been about in the last 5 years?

