|
Sandra Gollin Kies Department of Languages and Literature Benedictine University |
Daniel Kies Department of English College of DuPage |
Cameron Nazer Mozafari Department of English U of Maryland, College Park |
Corpora freely searched online. Each comes with its own search engine/interface and with different features. Some of the websites offer searches in more the one corpus.
Probably the most comprehensive list of large corpora comes to us through the Brigham Young University site: http://corpus.byu.edu/
English | # words | language/dialect | time period | compare |
Hansard Corpus (British Parliament) |
1.6 billion |
British |
1803-2005 |
|
Wikipedia Corpus (with virtual corpora) |
1.9 billion |
English |
-2014 |
|
1.9 billion |
20 countries |
2012-13 |
|
|
520 million |
American |
1990-2015 |
||
400 million |
American |
1810-2009 |
||
100 million |
American |
1923-2006 |
|
|
100 million |
American |
2001-2012 |
||
100 million |
British |
1980s-1993 |
||
50 million |
Canadian |
1970s-2000s |
|
Oxford English dictionary: http://www.oed.com/ (not free, but freely accessible through many academic libraries for students and faculty)
Oxford makes the OED available online, at http://www.oed.com/ by subscription or on CDROM. It remains the authoritative source of lexical, semantic, phonetic and etymological information on the English language.
PIE (Phrases in English): http://phrasesinenglish.org/
Web interface based on BNC phrases. Search for frequently co-occurring words lengths of 2 to 8 words (word clusters). Search all clusters of a particular length or clusters containing a particular word, phrase, or part of speech. Cluster lists with frequency statistics, and key word in context (KWIC) concordances of the clusters.
WebCorp: http://wse1.webcorp.org.uk/
Search in the entire Web as the corpus (basis: Google), Search by word, phrase or wildcard. KWIC concordances, word lists, some good advanced features. Disadvantage: not language-specific.
The archives listed below offer a variety of texts and smaller corpora for download. To search them with corpus analysis methods, you will normally need an offline text/corpus analysis tool, i.e. a concordancer. Alternatively, you may be able to carry out some simple analyses with online text analysis tools.
Corpora for learning: http://www.corpora4learning.net/
Links and references for the use of corpora, corpus linguistics and corpus analysis in the context of language learning and teaching. It also links to ongoing research and development projects.
Compleat Lexical tutor: http://www.lextutor.ca/
('text-based concordances' section) - analyze your own text: KWIC concordance for each word in the text. See also 'phrase extractor' section to build concordance with word clusters.