Sandra Gollin Kies
Department of Languages and Literature
Benedictine University



Daniel Kies
Department of English
College of DuPage



Cameron Nazer Mozafari
Department of English
U of Maryland, College Park



Corpus-Based Approaches to Writing: CCCC 2016 Workshop



Corpora

Free corpora

Corpora freely searched online. Each comes with its own search engine/interface and with different features. Some of the websites offer searches in more the one corpus.

Probably the most comprehensive list of large corpora comes to us through the Brigham Young University site: http://corpus.byu.edu/

English # words language/dialect time period compare

Hansard Corpus (British Parliament)

1.6 billion

British

1803-2005

Info

Wikipedia Corpus (with virtual corpora)

1.9 billion

English

-2014

Info

Global Web-Based English (GloWbE)

1.9 billion

20 countries

2012-13

 

Corpus of Contemporary American English (COCA)

520 million

American

1990-2015

* * * * *

Corpus of Historical American English (COHA)

400 million

American

1810-2009

* *

TIME Magazine Corpus

100 million

American

1923-2006

 

Corpus of American Soap Operas

100 million

American

2001-2012

*

British National Corpus (BYU-BNC)*

100 million

British

1980s-1993

* *

Strathy Corpus (Canada)

50 million

Canadian

1970s-2000s

 

Oxford English dictionaryhttp://www.oed.com/ (not free, but freely accessible through many academic libraries for students and faculty)

Oxford makes the OED available online, at http://www.oed.com/ by subscription or on CDROM. It remains the authoritative source of lexical, semantic, phonetic and etymological information on the English language.

PIE (Phrases in English): http://phrasesinenglish.org/

Web interface based on BNC phrases. Search for frequently co-occurring words lengths of 2 to 8 words (word clusters). Search all clusters of a particular length or clusters containing a particular word, phrase, or part of speech. Cluster lists with frequency statistics, and key word in context (KWIC) concordances of the clusters.

WebCorp: http://wse1.webcorp.org.uk/

Search in the entire Web as the corpus (basis: Google), Search by word, phrase or wildcard. KWIC concordances, word lists, some good advanced features. Disadvantage: not language-specific.

Literary text, media, and other archives

The archives listed below offer a variety of texts and smaller corpora for download. To search them with corpus analysis methods, you will normally need an offline text/corpus analysis tool, i.e. a concordancer. Alternatively, you may be able to carry out some simple analyses with online text analysis tools.

Corpora for language learning

Corpora for learning:  http://www.corpora4learning.net/

Links and references for the use of corpora, corpus linguistics and corpus analysis in the context of language learning and teaching. It also links to ongoing research and development projects.

Compleat Lexical tutorhttp://www.lextutor.ca/

('text-based concordances' section) - analyze your own text: KWIC concordance for each word in the text. See also 'phrase extractor' section to build concordance with word clusters.

Next, let's continue by discussing some methods for using the tools and the corpora.