Learning English with the British National Corpus

Guy Aston

Scuola Superiore di Lingue Moderne per Interpreti e Traduttori
University of Bologna

Paper presented at 6th Jornada de Corpus, UPF, Barcelona, May 1998

1. Moving corpora on stage in language pedagogy

As Widdowson (1989) has pointed out, the relationship between linguistics and applied linguistics is not simply a matter of applying linguistic theory to language teaching. Applied linguistics has its own concerns and its own criteria of relevance, in the light of which the methods and findings of linguistics need to be interpreted. Corpus linguistics, which over the last thirty years has come to have a significant impact on linguistic thinking, poses precisely these problems of interpretation. Leech (1992: 106) defines a corpus as "a helluva lot of text, stored on a computer". From an applied linguistic perspective, the central question is whether there is a hell of a lot the teacher and learner can do with one.

If we look back at the uses which have so far been made of computer corpora in language teaching, we can distinguish two main lines of approach. The first, which we might term a behind-the-scenes approach, has seen corpora used by publishers and researchers in developing syllabuses, materials and reference works for language learning - typically by focussing on the most frequent items and uses of those items to be found in corpora. This approach has been particularly influential in the production of reference works: in the wake of the pioneering COBUILD project, all the principal learner dictionaries of English now proclaim themselves to be `corpus-based'. There have also been various initiatives in the design of syllabuses and of classroom materials which have drawn on corpus data as a means of selecting and grading their linguistic content (e.g. Willis 1990, Willis and Willis 1987, Mindt 1997).

The behind-the-scenes approach has generally been characterised by the use of very large corpora and of sophisticated software, whose development has required massive financial investment and considerable linguistic and computational expertise. What is now the Bank of English, developed under the COBUILD project, and the largest corpus of contemporary English, already ran to some 20 million words in the mid-eighties and now exceeds 300 million - a quantity which is probably approaching the lifetime linguistic experience of the average person. The size and complexity of such resources, along with the need to protect commercial investment, have meant that large corpora have only been accessible to a limited group of researchers, with the relationship between the corpus and end-users - classroom teachers and learners - being mediated and controlled by experts. The end-user has only had access to the products of corpus analysis, and not to the processes which give rise to them. Thus publishers and researchers have stated that products based on corpus analyses are descriptively superior, but the end-user has had no possibility of performing these analyses and verifying their superiority directly. Only in certain ESP applications have smaller corpora been used to draft syllabuses for particular domains (e.g. Flowerdew 1993), potentially allowing for replication.

The second approach, which we may term the on stage approach, has instead attempted to bring corpora and corpus analysis directly into the teaching and learning environment. Its principal exponent, Tim Johns (based, like the COBUILD project, at the University of Birmingham), has coined the term "data-driven learning" to describe a discovery procedure where learners inductively derive and deductively apply generalisations by categorising data from corpora (Johns 1991). This procedure finds a justification in recent work in second language acquisition theory, which highlights the effectiveness of inductive learning from multiple examples (Ellis 1996, Skehan 1998), and it also fits with many of the premises of communicative language teaching, since it promotes a schematic view of linguistic knowledge and of language use (Aston 1995). Data- driven learning lends itself both to work where the teacher provides concordance data for learners to analyse, as in Johns' "classroom concordancing" model (1991, 1994), and to work where learners extract data from the corpus for themselves, be this in the classroom or in self-access contexts (Jordan 1992). It can also give rise to a range of communicative activities by providing "reasoning gaps" (Prabhu 1987) which learners must bridge, as they agree on how to interrogate the corpus, how to identify regularities, and how to interpret findings (Bernardini 1997; forthcoming).

While providing learning opportunities of a theoretically valid nature, on-stage corpus use has tended, given the limited financial and technical resources of the average educational institution, to be based on relatively small corpora (of a few hundred thousand words at most: Flowerdew 1996) which have lacked the careful design of the large research corpora which dominate behind-the-scenes uses. This means that generalisations made from them are likely to be of limited value. For instance, one of the few published small corpora, MicroConcord Corpus A (Murison-Bowie 1993), consists of newspaper articles drawn from one year's issues of The Independent. While it may tell us something about that year of that newspaper, it will not allow reliable conclusions to be drawn about newspaper language in general, and obviously not about uses in other registers. The limited size and opportunistic construction of such corpora makes them inherently less generalisable from than large research ones, particularly as far as features which are relatively uncommon and/or unevenly dispersed are concerned.

The divide between the behind-the-scenes and on-stage approaches is currently diminishing, however. Thanks to changes in policy, and the growth of computer networking, large research corpora are now becoming more generally accessible, and consequently available for on-stage use. It is now possible to consult two large corpora of English over the Internet, at relatively low cost and using relatively straightforward software, offering greater reliability for on-stage work in the classroom or in self-access, with better documentation of less common features across a wider range of texts and text- types. On the one hand, this allows generalisations derived from small corpora to be tested and broadened, and on the other, as I hope to demonstrate, it allows for a greater variety of learning activities. In this paper I illustrate some on-stage uses of the British National Corpus (BNC), which is now freely available in Europe for non-commercial research purposes - including research by teachers and language learners.

2. The British National Corpus

(note 1)

The BNC consists of approximately 100 million words of contemporary British English, taken from over 4100 texts of different types, spoken and written: the spoken component, in the form of transcriptions, runs to 10% of the total (for more details on the composition of the corpus, see Aston and Burnard 1998). The corpus is marked up with information as to the nature, source and structure of each text, and each word is annotated to show its part-of-speech. All this additional information is given in SGML (Standard Generalised Markup Language) tags between angle brackets, thereby distinguishing the markup from the words of the text itself. The complexity of the markup underlies much of the BNC's potential for on-stage use in language pedagogy, and I shall therefore begin by briefly describing it. (NOTE: For formatting reasons, SGML tags in the HTML version of this paper are shown between square rather than between angle brackets, except in the .gif images in figures 1 and 4 below.)

Figure 1 shows some of the principal features of written text documents in the BNC, each of which corresponds to one written text.

Figure 1
Figure 1
Each document is marked up as a single [bncDoc] element, which contains a [header] element and a [text] element. The [header] contains information about the text - bibliographic details concerning its source, and its categorisation along such parameters as domain (topic), medium (published or unpublished, book or periodical, etc.), type of author, etc. - while the [text] element contains the text itself, which is divided into [div0] elements representing major sections, such as the chapters of a book or the articles in a newspaper, in turn divided into [div1] elements representing sub-sections, in turn divided into [div2] elements, and so on. Each of these divisions may contain a [head] element (the section heading), and a series of [p] elements representing paragraphs. These [head] and [p] elements must contain a series of [s] (sentence) elements, which are in turn made up of words ([w] elements) and punctuation ([c] elements).

Figure 2 illustrates the low-level structure of part of a written text.

Figure 2

[div2 complete=Y org=SEQ r=bx] [head type=MAIN] [s n=1202]
[w PRP]IN [w AT0]THE [w NN1]BEGINNING[c PUN]&hellip [/head]
[p] [s n=1203] [w AT0]The [w NN1]word [w NN2]jeans
[w VVZ]originates [w PRP]from [w AT0]the [w NN1]place 
[w NN1]name [w NN1-NP0]Genoa [w AVQ-CJS]where 
[w NN2]sailors [w PRP]from [w AT0]the [w NN1]port 
[w VVD-VVN]hit [w PRP]on [w AT0]the [w NN1]idea [w PRF]of
[w VVG]making [w NN2]trousers [w PRP]from [w AT0]the 
[w AJ0]sturdy [w NN1]sailcloth[c PUN]. 

It shows the beginning of a [div2] element (the values of whose attributes show that it is complete and sequentially organised). This section starts with a main heading that contains the 1202nd sentence in this text. This sentence contains a preposition (IN), a definite article (THE), a singular common noun (BEGINNING) and an ellipsis (three dots). The heading ends at this point (a slash following the opening bracket marks the end of the element in question). It is followed by a new paragraph, which begins with the 1203rd sentence, which begins with the definite article, and so on. All this information need not of course be displayed, and for many purposes it will be more convenient to view it on the screen as in Figure 3:

Figure 3

       The word jeans originates from the place name Genoa where
sailors from the port hit on the idea of making trousers from
the sturdy sailcloth. Denim is the name of the blue woven
cloth first made in the French town of Nimes. So now you know! 

The typical structure of spoken text documents is similar, and is shown in Figure 4.

Figure 4
Figure 4
The [header] here also contains information concerning the various participants in the interaction, and it is followed by an [stext] (spoken text) element. The latter consists of [div] elements representing different events or conversations, which in turn consist of [u] (utterance) elements representing turns at talk. Spoken texts may also contain non-verbal features such as laughter and coughing, and paralinguistic ones, such as shifts in voice quality, pauses, cut-offs and overlaps. They may also contain indications of unclear segments and editorial omissions in the transcript. Figure 5 shows an extract from a spoken text: for each utterance, the speaker is identified by the value of the who attribute on the [u] element, and the beginnings (and endings) of mutually overlapping sections are marked by [ptr] elements whose t attributes share the same value.

Figure 5

[u who=PS6M6] [s n=079] [w CJC]And [w PNP]I [w PNP]I
[w VVB]mean [w PNP]it [w VBD]was [w AV0]absolutely
[w AJ0]gorgeous[c PUN]. [s n=080] [w PNI]Everything[c PUN],
[w AT0]the [w NN2]railings [w PNP]you [w VVB]know[c PUN],
[w AVQ-CJS]when [w PNP]they [w VVB]put [w AVQ-CJS]when
[w PNP]they [w VVD]painted [w AT0]the [w NN2]railings[c PUN],
[w AT0]the [w AJ0]burned [w AT0]the [w AJ0]old [w NN1]paint
[w AVP]off[c PUN], [ptr t=KNHLC00U] [w AT0]the [w AJ0]new
[w NN1]paint [w AVP-PRP]on [ptr t=KNHLC00V][c PUN]. [/u]
[u who=PS6M7] [s n=081] [ptr t=KNHLC00U] [w ITJ]Ah [w ITJ]yes
[ptr t=KNHLC00V] [w ITJ]yes [w ITJ]yes [w ITJ]yes
[w ITJ]yes[c PUN]. [/u] [u who=PS6M6] [s n=082]
[w AV0]Now[c PUN], [w AJ0]old [w NN1]paint [ptr t=KNHLC00W]
[w AV0]just [w AV0]straight [w AVP]on [w AJ0-NN1]top
[ptr t=KNHLC00X] [/u] [u who=PS6M7] [ptr t=KNHLC00W] [unclear]
[ptr t=KNHLC00X] [/u] [u who=PS6M6] [s n=083]
[w ITJ]Aye[c PUN]. [s n=084] [w PNP]It [w AV0]just [w PNP]it
[w VVZ]looks [w AJ0]terrible[c PUN]. [/u]

Figure 6 shows a simplified display of this extract: the carets correspond to elements, delimiting overlaps, while the ellipsis in parentheses indicates an unclear segment.

Figure 6
{PS27J}:    And I I mean it was absolutely gorgeous.  
Everything, the railings you know, when they put when they
painted the railings, the burned the old paint off, ^ the new
paint on ^.   
{PS27K}:    ^ Ah yes ^ yes yes yes yes.   
{PS27J}:    Now, old paint ^ just straight on top ^   
{PS27K}:    ^ (...) ^   
{PS27J}:    Aye. It just it looks terrible.   

This detailed markup makes the BNC a very flexible instrument. The encoding of text structure means that it is possible to search not only for all the occurrences and co- occurrences of words or phrases in the corpus as a whole, but also for ones in certain structural positions (for instance co-occurrences within the same sentence, occurrences at the beginning/end of paragraphs/utterances, or following a pause), as well as ones with particular part-of-speech values. Similarly, the information in the header allows the user to restrict a search to certain texts or types of text, or to the speech or writing of certain participants or categories of participants. As we shall see, this can provide material for a wide variety of activities.

3. Four ways to learn from the BNC

In this section I illustrate four types of on-stage applications of the BNC for language learning. All are based on using SARA, the specially-designed software for interrogating the BNC over networks.(note 2)

3.1 Contingent reference

Perhaps the most obvious way in which the BNC can be used by teachers and learners is as a reference tool in text production or reception, specifically during activities of writing, reading, and translation. Given its size and variety, the corpus can frequently provide solutions to specific problems which may emerge, as an alternative and/or complement to conventional reference tools such as dictionaries, grammars, and encyclopaedias. However, the user needs to think carefully about how to formulate the necessary queries and how to interpret the data provided, as the examples below will illustrate.

3.1.1 Reference in text production

One area where the corpus can provide useful information concerns the frequency of alternative forms, including near- synonyms. Take, for instance, the spelling of nominal compounds, which is extremely varied in English. A learner who is uncertain as to whether to write the expression wristwatch as one word, with a hyphen, or as two words, can discover by searching for these forms in the BNC that they occur 74, 16 and 12 times respectively. This suggests that the one-word form is quite the most usual. However, these figures do not tell us how many of these occurrences are in spoken texts, where spelling may be at the whim of the transcriber, or how they are dispersed across different written texts, in each of which orthography may be standardised by the author or publisher. Limiting the search to [text] elements (i.e. written texts) and taking only one solution from each text, we obtain slightly different numbers: 56, 13 and 9 occurrences respectively. The proportions for the different forms are still roughly 12:3:2, confirming that the one-word form should probably be adopted by the learner, ceteris paribus. However, it is easy to imagine how the numbers of texts might have told a different story from the numbers of occurrences: even with such an apparently simple problem the user needs to consider how queries can best be formulated to provide relevant information.

As well as providing information about the frequency of particular forms, the corpus can provide information about the frequency of particular collocates. This can cast light on synonym use. Suppose a learner is uncertain whether s/he should talk of pursuing or chasing an objective. A search for forms of pursue (pursue, pursued, pursues, pursuing) occurring within a span of nine words on either side of objective or objectives, finds 100 solutions, whereas an equivalent query for forms of chase (chase, chased, chases, chasing) finds only one. The difference here seems large enough to dispel all doubt as to the more appropriate choice. However this inference still depends on the appropriacy of the query for the purpose at hand - for example, whether it is appropriate to include the alternative forms of the lemma (chase can be a noun as well as a verb), and to use a span of nine words. To take another example, if we are concerned to discover whether we might better describe a man as beautiful or handsome, the latter is much more common as a collocate of man within a span of two words, but much less so within a span of nine words. The reason is that beautiful is a much more frequent word than handsome in the corpus as a whole, and therefore more likely to appear in the non-adjacent context (Aston and Burnard 1998: 82-84). In the case of pursue and chase, the marked difference in frequency is still present with smaller spans.

As well as specific collocates, the corpus can highlight syntactic and semantic patternings. Discussing the respective economic prospects of teachers, interpreters, and translators, one student came up with the sentence Translators earn far and away the least. Recourse to the BNC showed that there were 73 occurrences of far and away, and that the majority preceded superlatives, confirming this colligational pattern. Looking more closely at the behaviour of the expression in a random 30 concordance lines, however, revealed that patterning was semantic as well as syntactic. Far and away was almost always used to intensify adjectives and adverbs with positive connotations - having a positive semantic prosody, in Sinclair's (1991) terms (Figure 7).

Figure 7

in a full League season was to remain far and away the best by any Palace goalkeeper for over ha
a proudly acknowledged agency, and is far and away the most successful PR exercise (perhaps the
free and unfree peasants.These formed far and away the largest group in the population of Europ
enry I's time, that for 1129 - 30, is far and away the earliest royal account to survive in any
tgun. First point: netted rabbits are far and away more saleable. There is no shot in them, the
re twelve different types although by far and away the most common are called `liberty caps", s
h domestic market remains to this day far and away the largest consumer of Champagne.|
es, their vice-chancellors and deans. Far and away the most important powers, however, are thos
 unfortunately it looks a bit messy.| Far and away the most interesting aspect of this guitar i
uple, who worked together in the film Far and Away, are billed as Tinseltown's most romantic co
nd which in the long run had made her far and away the most loved of all the members of the Roy
possibly just down the road! It is by far and away the best single-source reference on this eve
 the West End premiere of their film, Far And Away.| The following day they trotted off to Laur
 nearest rival, Tesco, they've become far and away the most popular places to do the weekly sho
ty at the top level then Wright is by far and away ahead.|`Scoring at this level is not a one-se
hich Britain underwent in the 1980s?| Far and away the most important point is that the museums
aspire to the red jersey of Wales was far and away the most dashing thing you could do. Burton 
onships, but because it would have by far and away the largest European economy outside the EEC.
ondon market - in 1972 - but has been far and away the most consistently successful. It was hel
 enjoyed a lucrative tourist trade as far and away the most popular resort of pilgrimage, the s
erty, virtually accounts for what was far and away the greatest personal estate owned by any com
If there is a local one, then that is far and away the best place to go, otherwise there is no 
 next bend and a Fly-Drive package is far and away the most convenient and comfortable way to s
matter. The Chancellor of Germany, by far and away, in economic terms, the most powerful countr
ly, shows that the United Kingdom has far and away more undertakings with more than 1,000 emplo
. Biffen), whom I certainly regard as far and away the most successful Leader of the House in a
n. Member for Chesterfield, which was far and away the most interesting part of the debate -^ A
 into the fort with all its comforts. Far and away superior to those we had at our base RAF Hin
ular intervals. It was, of course, by far and away a situation too good to last and in time, gaz
to add to his laurels.| He has had by far and away his best season since moving to Newmarket fr

Furthermore, in these citations far and away occurred with verbs with stative meanings - be, have, remain, become, form, etc. - unlike the more process-oriented earn of the student's proposal. Overall, in this case the corpus data turned out not to support the option proposed by the learner. The analysis did, however, suggest some possible alternatives. This student managed to reformulate her sentence as Interpreters are far and away the most highly paid. This was a result of her recognising quite complex and abstract syntactic and semantic patterns in the data - as well as of discounting irrelevant instances, such as those where Far and away is the title of a film.

Another area where the corpus can provide evidence of appropriacy is with respect to register - though again careful thought may be necessary in designing queries and interpreting results. Should a learner writing an essay describe the probability of a plane crash as pretty unlikely? Or is such an expression too informal? Figure 8 shows the respective frequencies of pretty as an adverb in the whole corpus, in spoken texts, and in written texts from two different groupings of subject domains - on the one hand Imaginative and Leisure, on the other the remaining BNC domain categories (Arts, Belief and thought, Commerce and finance, Natural science, Applied science, Social science, World affairs) - a grouping we would expect to be generally more formal.

Figure 8

                           occurrences         million words       occurrences/
                                                                   million words

whole corpus                  4322                 100                  43

spoken                        1110                  10                 111

written (imaginative          2249                  30                  75
and leisure domains)

written (other domains)        963                  60                  16

Comparing the numbers of occurrences of the adverb pretty with the total numbers of words for each category, we find less use in writing than in speech, and the least in the less formal written domains, suggesting that the learner might be advised to avoid it in formal writing. Examining a random concordance of pretty as an adverb in the formal domain group enables this generalisation to be refined somewhat (Figure 9); pretty seems often used where, for some reason, the discourse shifts to a less formal, more conversational style - in direct speech and authorial asides, for example - and along with other markers of informality, such as contracted forms, first person singular pronouns, etc.

Figure 9

be replaced.  Mr Morton, said he was `pretty confident" that would not happen.  |  If a d
 until eleven o'clock everything went pretty well, When just as you start thinking to you
n Tillage or pasture, and the Country pretty fully inhabited, it cannot be desirable that
.  However, Mr. Danse, the Vicar, was pretty shrewd and was able to strike a deal with Si
he evidence, as audience studies have pretty conclusively shown.  Indeed it is now uncont
ites.  All the internal organs looked pretty normal to the naked eye.  There were some gr
e sounds but leaving the overall feel pretty much unchanged.  |  In RMS/Soft Knee mode th
    47 Backchat    Mat Coward on PR - pretty ridiculous    47 Forteana    Paul Sieveking 
he intention is to deceive and we are pretty hard on that."  |  Caterham is still technic
cks, holographic jewellery - it looks pretty much like a theme park.  For the shoot, Coli
s.  Other people have to do something pretty dramatic for us to notice.  Putting a case m
aining and I rated the whole thing as pretty good."    |  An Ideal Husband | |  Ivan Wate
.  Only property above a minimum (and pretty exorbitant) price may be purchased.  You mig
Ross is keeping price and performance pretty much under cover, though there is talk of th
 of johs I ought to do - I used to be pretty thorough - and there are things I haven't go
 and cross country captain.  I got on pretty well with Reg Witter, the games and PE maste
achievement to say that his theory is pretty weird all the same.  It has to be to get the
who refused to use soap, and that was pretty horrible.  (Both used to lie in baths hoping
pt that in all probability it will be pretty well apparent to the reader quite soon who i
given a sentence.  Otherwise I got on pretty well, had a laugh: you had to.  I don't thin
of writing, sent one on later so it's pretty safe to assume that the trial was free.  Don
ck but, given a steady hand, it works pretty well with a mouse.  Once drawn, of course, t
ur advertisers to put on record that, pretty well without exception, they have a lively a
 used in the whole of world war two.  Pretty well the entire post-biblical civilian infra
 greater than their share value shows pretty clearly how much value corporate managers ca
time when the advertising cupboard is pretty bare.    |  Banbridge 21 CIYMS 8 | |  On a w
binding declaration was dismissed as "pretty irrelevant" by a UK government official.) ^ 

We can also, en passant, notice the recurrence of pretty well as a collocation - almost 20% of the occurrences of pretty as an adverb in these domains, as may be discovered using the SARA Collocation option. In comparison to a dictionary, here the corpus offers far more subtle information, potentially proposing variables which may have been missed in the learner's original formulation of the problem. In order for these to emerge, however, the learner must not expect the answer to simply leap out of the corpus: s/he needs to reflect on appropriate criteria to distinguish particular text-types, and to browse wider contexts than the single concordance line to distinguish particular discursive styles with confidence. Pretty may hold other surprises for the learner, as we shall see below.

3.1.2 Reference in text reception

So far we have considered the BNC as a tool to check and generate hypotheses in text production. Equally, it can serve as a means of enhancing understanding in reception. Unlike a dictionary, a corpus may not provide a definition for an unfamiliar item or sense, but it can give a much better idea of usage, including the extent to which such usage habitually varies. Thereby it may enable the learner to recognise whether the instance encountered is in fact untypical, and hence warrants a particular interpretation. Thus, when the adverb pretty, which as we have seen belongs to an informal register, occurs in a formal text-type, we can see it as potentially marking a shift in authorial stance, for instance to a collusive aside.

The example in Figure 10 comes from a headline in the Financial Times:

Figure 10

Profit warnings hit Tokyo markets
Collapse of Falichi Corp rekindles fears in banking sector

One problem here for the learner may be understanding the meaning of rekindles in this context. A randomly selected concordance of forms of the verb rekindle (Figure 11), of which there are a total of 147 occurrences in the corpus, shows that it is typically used metaphorically, as it is in this example.

Figure 11

gerac on Saturday.|Rugby Union: Young rekindles Waspish spirit||By BARRIE FAIRALL||Wasps....
or, said: `We have the opportunity to rekindle Liverpool's spirited sea-faring tradition an
dy Derby on June 3.| Just as the race rekindled Classic hopes for Stoute, the flame was snuf
ss close to Explorers, which hoped to rekindle pride in the old customs, language and tradi
 begin again.How is extinguished fire rekindled?It evaporates in a gaseous form from the Ear
 you..." Her voice faded as her words rekindled memories.| The trimphone extension warbled u
ack on track again."|`We hope it will rekindle the atmosphere of old, not just on the field
treet star Chris Quinten is trying to rekindle his career- by appearing in panto.| He will 
-riding Norwich. Kendall said: `We've rekindled the fans" hope and belief and eased their ap
o be restated. There is now a need to rekindle the idea that teaching is a vocation which m
60 Kennedy-Nixon debates is enough to rekindle the exaggerated sense of urgency then felt t
 was born in post-war Europe could be rekindled, larger and brighter in a post-cold-war worl
d memories, some of which he hopes to rekindle if his plans for a visit next year come to f
 picture-postcard thatched cottage is rekindling some very happy memories|| Home for novelist
restaurant. She hoped Angus wanted to rekindle their love affair, as she did.| Rules was de
mpt to shed his diplomatic veneer and rekindle memories of his early rough and tumble North
as good as died for him. That thought rekindled his fury, briefly. However dreadful his task
t long enough to sate his desires and rekindle her expectations. And so this grumbling thre
he one hand, such action would simply rekindle the international outcry that resulted in th
e it was extinguished. Our task is to rekindle it. Will you not help me?"| He paused again,
 progressed many old friendships were rekindled and new ones formed with `cross fertilisatio
 his intention to evict her, that had rekindled the dream in the first place.|`Then I'll say
iser Brendan Foster tipped his pal to rekindle memories of his glory days in his new event.
es from the Gulf, and thereby avoided rekindling the debate about the constitutionality of de
                                      REKINDLE AN AGE OF ELEGANCE| Here they are! The fines
reaks suggest that some may have been rekindled from underground smoulderings dating from at
irty years on, a book on Joe Meek has rekindled interest in Britain's first independent pop 
o early in the campaign.| In order to rekindle the title dream, the restoration of confiden
iverpool's players and supporters can rekindle the Auxerre spirit in front of a sell-out 38
situation that could cause stress and rekindle bitter feelings."| Single parents will be ob

The kinds of things that are rekindled are emotional states - hope(s), interest, memories and the like. The connotations of rekindle seem generally positive, but there are enough negative examples to suggest that this prosody is not constant - we also find bitter feelings, fury and international outcry, for example.

The positive semantic prosody for rekindle emerges strongly if we examine the occurrences with its most frequent collocate, memories (Figure 12).

Figure 12

oleaxed."Paul added that it rekindled memories of a Borussia Moenchengladbach v Inter Milan
nversations, many happy and formative memories can be rekindled.| Or they may wish to discu
e dead. The sudden rekindling of past memories and passion for the man she had been about t
er voice faded as her words rekindled memories.| The trimphone extension warbled urgently f
acticality, interlaced with many fond memories, some of which he hopes to rekindle if his p
y loaned by Mr. E. Roberts) rekindled memories of the last down `Cornishman" which ran on S
ed his diplomatic veneer and rekindle memories of his early rough and tumble North Country 
. His pizza slices certainly rekindle memories of the good old days in football... they tas
he metropolis and beyond, to rekindle memories of times past.| Early arrivals heard one of 
htness about it as well. It rekindles memories of those old-fashioned Hollywood romances of
dan Foster tipped his pal to rekindle memories of his glory days in his new event.| Eight y
o 302 all out in 47 balls to rekindle memories of their Cup disaster last month when they l
ar.| The 12-strong cast will rekindle memories of the Andrews Sisters, Tommy Handley, Rita 
ly, that the programme would rekindle memories of the singles holiday in Torremolinos or th

Nearly all the 15 citations suggest nostalgia, with a revival of happy/fond memories of the good old days/times past/glory days. The same nostalgic sense seems present in a number of the other citations in Figure 11 - the atmosphere of old, for instance.

On the other hand, nostalgia would hardly seem to be at issue where negative emotions are involved. Looking at these instances (Figure 13),

Figure 13

al" about the BMA's backdown to avoid rekindling the controversy.They are keenly aware the BM
eek after Aldershot were wound up and rekindle fears for several Fourth Division clubs faci
at the heart of Europe".|It will also rekindle suspicions among the Euro-sceptical wing of 
vements of people, exacerbated by the rekindling of the civil war between the north and the s
ted, and the media hype threatened to rekindle itself. As if frightened by more unwanted ex
ease with which the nationalists have rekindled historical resentment and traditional chauvi
peacekeeping operation in Croatia and rekindle the flames there. The position of all minori
60 Kennedy-Nixon debates is enough to rekindle the exaggerated sense of urgency then felt t
gnalled its intention to press ahead, rekindling the fury of the country's 4,300 mostly white
 guise of financial conglomerates has rekindled this debate. Nine types of conflict of inter
tion that texts be used in such a way rekindled related anxieties. But the issue was now rai
as good as died for him. That thought rekindled his fury, briefly. However dreadful his task
nce that German nationalism should be rekindled at the very time we're about to reduce our t
he one hand, such action would simply rekindle the international outcry that resulted in th
th every opportunity in the world for rekindling those ugly sparks of revolution.| Thank God 
smissal of Elise as a mere client had rekindled all her misgivings. And yet Luke's presence,
 into gear, and the glint in his eyes rekindled the unwelcome wildfire in her veins.| She no
tality - afraid that the memory would rekindle some private pain. He spared us both by refe
es from the Gulf, and thereby avoided rekindling the debate about the constitutionality of de
gh-Pemberton issued his warning about rekindling inflation, Downing Street abruptly changed i
situation that could cause stress and rekindle bitter feelings."| Single parents will be ob

what they seem to have in common is the position of the speaker, who takes a detached or even ironic stance with respect to the feelings described. From this perspective, the Financial Times headline can perhaps be seen as taking a certain distance from the emotions of the Tokyo stock market, and one wonders whether the same sub-editor would have used the expression rekindles fears to describe events in the City of London. Interestingly, a similar distancing appears present in some of the apparently positive examples: returning to the examples of rekindle memories, and looking at a rather larger context, we can see that some of these too appear to be ironic (Figure 14).

Figure 14

Mig Romerez did not even recognise a football when I showed
him one but his exotic appearance should be enough to impress
those bumpkins in `The Tip" crowd. His pizza slices certainly
rekindle memories of the good old days in football... they
taste like Dubbin.

Eldorado pitched itself to the tabloids as a `sun, sea, sex
and sangria" story. Hungry hacks were flown out to the set to
experience the four S's for themselves. It was hoped that the
Bonkidorm and Costa del Bonk set would tune in avidly, that
the programme would rekindle memories of the singles holiday
in Torremolinos or the villa trip to El Capistrano.

Overall, rekindle seems to be used in two contrasting ways: either the speaker/writer can identify with the (positive) emotional states described, or they can distance themselves from them. It is this second use which would explain its occurrence with negatively as well as positively connotated events. Such contrasting uses appear to be found with many clich‚d expressions, and corpus examples can provide a useful way of helping learners appreciate them, and hence to decide the connotations of particular cases.

3.2 Studying a language issue

Like that of pretty, such a study of rekindle goes in many ways beyond solving a problem in interpreting or producing a specific text. It comes closer to a second type of corpus use, in which a particular linguistic feature or group of features is studied for its own sake, in order to learn how it works in the language. The aim of such study is not to rival the work of the professional lexicographer or grammarian, but to deepen understanding of the feature or features in question through personal discovery. For further examples, we may return to the concordance of rekindle memories (Figure 12 above), where we find several features which could be of interest. For instance, most learners will be familiar with the expression be fond of, but how many will feel comfortable with the attributive use of the adjective, as in fond memories (line 5)? By selecting a random sample of occurrences of fond, and then sorting them by the part-of-speech of the word which follows, we can group those where fond is followed by a noun, and then investigate the frequencies of particular nouns as collocates in this position (Figure 15).

Figure 15
Figure 15

It emerges that memories is quite the most frequent noun to follow fond, some way before farewell, farewells, memory and parents (this order is maintained even if we group the collocates semantically, including fathers, mothers, and other relatives with parents). Numerically, the 70 occurrences of fond memory/ies and 42 of fond farewell/s suggest that these forms may be worth memorising by the learner as fixed expressions. The corpus not only helps the learner identify the most common uses of the feature being studied, but also to decide which may be worth learning and which not.

Another expression in the concordance of rekindle memories which learners may not know is glory days. While its denotation is easy enough to understand, the 42 occurrences in the corpus provide information as to its contexts of use of a less predictable nature (Figure 16).

Figure 16

    |  PEOPLE AND PLACES |  |  John's glory days | |  SOCCER player John Groves fears a broke
the club has had a brief taste of the glory days but now is immersed in the worst crisis in i
 it    Brooklands Today The circuit's glory days live on    A Week in a Bentley Brooklands To 
since they won the FA Cup back in the glory days of 1947.  |  Certainly not the army of suppo
din.  He should be fit.  |  Swindon's glory days in the FA Cup were a long time ago while Cam
tico Alberto" banners stored from the glory days of three years ago, when he won ten times in
its he cries when he sees film of the glory days in Italy when Gazza was ready to become the 
man to Bobby Robson in Ipswich Town's glory days in the early 80s, is wanted by Sunderland.  
Elland Road - just like it was in the glory days of super manager Don Revie and hard-man skip
itor of  Sounds  during its late '70s glory days.  More importantly, he had also run a pub, w
ources and the backing of Fiat, their glory days are in the past.  Last won the Constructors'
 would have missed out on all Rovers' glory days of promotion, Wembley and Europe.  |  The wa
king! | |  All-original hits from the glory days of pop, plus a FREE ALBUM of classical Elvis
John Dawes Room, with pictures of the glory days, the great man is optimistic.  `I shall be v
ouness has struggled to recapture the glory days at Anfield.  |  He was suspended for five ma
d his pal to rekindle memories of his glory days in his new event.  |  Eight years after sett
arly twenties, and well remember the `glory days" of Newcastle United with their world class 
s deserve my loyalty, says Bassett    GLORY DAYS...  Dave with the FA Cup won at Wimbledon   
have to be Bruce Springsteen singing `Glory days - well, they pass you by..."  |  Steve.    |
is own name for Stewart.  In 1971 the glory days returned as Stewart won six of the eleven ro
et its no coincidence that during our Glory Days we had the same players year in year out. ^ 
eegan's return.  |  And the man whose glory days at Goodison included League championship, FA
supporters brought up in the pre-1968 glory days are mostly content to support the White Rose
allowing in memories of the long-gone glory days.  (OK, so I'm guilty of psychic breaking and
ults cannot compare with those of the glory days of 1989, but nobody was complaining.  `It's 
y side desperately keen to revive the glory days of the late 80s.  |  The influence of Kilken
 support of the whole village." ^     GLORY DAYS: Ice star John Curry in 1976    |  LAST OF T

Apart from being the title of a song by Bruce Springsteen (one of the innumerable snippets of encyclopaedic knowledge that may be picked up from the BNC), glory days seems principally to refer to the past successes of sportsmen or sports teams, being primarily associated with sports journalism as a genre (bar the odd case from music journalism, where we have a similar meaning of group triumph). There is no occurrence in speech. Further queries indicate that its form is as fixed as its context, there being no cases of intervening modifiers between the two words, no instances of glory day, and only the occasional glory nights and glory years.

The advantage of using the corpus in this manner is that the learner is encouraged to investigate variation of form and function in relation to the dispersion of a feature in the language as a whole, rather than simply in relation to a specific context, as in contingent reference use. The learner who investigates the relationship between the use of an item and sociolinguistic factors may come to appreciate the importance of a range of situational variables. For instance, as well as indicating what kinds of texts an expression is used in, the corpus can also reveal what kinds of users employ it. In the spoken component of the BNC, there are 109 occurrences of the word navy. 51 are produced by male speakers, and 44 by female speakers (the remainder being by speakers whose identity is uncertain). As the total amounts of speech by male and female speakers in the BNC are very similar,(note 3) these figures might suggest that the word is fairly equally used by both sexes. However, when we distinguish between the nautical and colour senses of the word, we find a clear distinction, with the colour sense far more common in utterances by women. Or, to return to the example of pretty, we find that the word pretty is rather more frequent in men's speech (730 occurrences) than in women's (514).(note 4) However when we compare its use as an adjective and as an adverb, we find that 40% of female use is adjectival, whereas only 7% of male use is. This means that overall, pretty as an adverb occurs over twice as often in men's speech as in women's (681 vs 306 occurrences), while pretty as an adjective is over four times as common in women's speech as in men's (208 vs 49 occurrences). The learner who wishes (not) to conform to gender stereotypes might draw her/his own conclusions as to whether and how to use pretty in speech.

Similar comparisons can be made for different age groups: we find, for instance, that wicked has negative connotations for older speakers, but positive connotations for younger ones. And while such comparisons will of course not always provide relevant distinctions, discovering that they sometimes do may encourage learners to think about when they might be relevant, and to refine their use of the corpus accordingly.

3.3 Serendipitous exploration

A corpus like the BNC lends itself to browsing, rather as one might browse in a bookshop or library. Rather than just focussing on a single problem or feature, the user can explore serendipitously, passing freely from one curiosity to the next. In the last section we used the concordance of rekindle as a starting point for studies of other features, such as fond and glory days, which that concordance happened to contain. In their turn, these investigations might have led on to studies of further features - a brief taste and recapture, for instance, which are collocates of glory days (Figure 16). Virtually any concordance will throw up potential curiosities of this kind, and the SARA Browser option, which allows the user to scan the entire source text from which a citation is taken, further increases the opportunities for discovering them.

In these examples, exploration of the corpus is syntagmatic, in the sense that attention shifts from a previously searched-for feature to one in its context. However it is also possible to explore paradigmatically, shifting attention from a previously searched-for feature to a feature or features which are formally or semantically related to it. An investigation of pretty as an adverb invites comparison with pretty as an adjective, which, we saw in the last section, turn out to have very different distributions across male and female speakers. At the single word level, comparisons of this kind may be prompted by the listings provided from the corpus index of forms and of the parts of speech associated with them. For instance, searching for forms of the verb budge using the SARA Word Query option will display a list of all the word-forms in the corpus beginning with the letters budg - including not only budge, budged, budges, and budging but also budgerigar and budgerigars. A curious learner might be inclined to investigate these words, just as s/he might be inclined to investigate near synonyms of budge which come to mind, such as shift, or antonyms, such as stand. Formal and semantic association may prompt investigation of similar phrases as well as words: the concordance of rekindle memories (Figure 12 above) might stimulate not only an investigation into the collocate times past, but also a comparison with past times - which turns out to lack the former's nostalgic connotations. Other strands of serendipitous investigation include possible variants of a phrase (are there instances of glory weeks?), as well as varying positions in text structure (in headings, at the beginning/end of paragraphs/utterances), in different text-types or speaker-types, and even in specific texts or speakers. Once the learner has mastered the software, and realised the different kinds of information the corpus can provide, a combination of paradigmatic and syntagmatic exploration can become a routine leisure activity which proceeds happily and profitably for hours at a time.

3.4 Encyclopaedic and oracular use

The examples so far discussed have all supposed that the focus of interrogation will involve linguistic features of some kind. However other, non-linguistic approaches are also possible: as we have seen, the BNC can provide large quantities of encyclopaedic information. A search for the name of a person or place will usually throw up a range of interesting facts, from Manchester to Masoch. Corpora can also illustrate cultural stereotypes and prejudices, as Stubbs (1996) has pointed out. A learner might try searching for Irish or Kraut, or comparing the collocates of man and woman, or the use of racial, tribal and ethnic (Krishnamurthy 1996).

It is equally possible to use the corpus for less serious purposes, looking, for instance, for occurrences of one's own name, or of expressions related to topics of particular personal interest (beer, sex, linguistics, etc.). It can even act as a kind of oracle. By searching for occurrences of sentence-initial phrases (My problem is ..., What I want is ..., Why don't you ...), for instance, one can then examine their continuations for suitable advice. Occurrences of questions (How are you?, What's for dinner?), can similarly be examined for the responses which follow. Figure 17 shows a random selection of instances of one such oracular question.

Figure 17

ble, `Do you love me?"  With his wife he had known precisely where he was.  No marriage had begu
ugars do you love me?"  |  `A million pounds."  And he'd bounce me on my bed and make a `little 
Do... do you love me?"  |  Her head was bent and her words were hardly audible above the noise o
  |  `Do you love me?"  |  `Yes, I do, John, with all my heart."  |  `That settles it."  |  The 
 `And do you love me?"  |  He did not answer this question.  |  `Oh Angel - my mother says she k
  |  `Do you love me?" he murmured, his mouth exploring her ear.  |  She nodded dumbly.  He held
  |  `Do you love me?"  Andy asks, looking up at him.  |  `Of course I love you," John says.  | 
er.  `Do you love me?"  |  `No."  |  `That's right.  Killed anyone lately?"  |  `Three last nigh
s.  ` Do  you love me?"  |  He stood very still for a few seconds, a faint frown lining his foreh
ly.  `Do you love me?"  |  Caroline jerked her hand back, and Nicolo caught it and held it in hi
ly.  `Do you love me?"  |  `Yes," she said, `of course I do.  I love you with all my heart."  | 
s? "  Do you love me?  There was a loud scream of affirmation, and it was only then, as the audi
s? "  Do you love me?  Surely he didn't need to ask.  The audience of willing females had shoute
s? "  Do you love me?  Shelley held on tight to the seatbelt, and looked sideways at Miguel.  Th
s? "  Do you love me?  And Shelley shook her head to clear it.  Was she so very tired that she c
, but do you love me?"  |  The words were what she had longed to hear, and she stayed silent, sa
 me.  Do you love me?"  |  She came to life, put her arms round his neck, and stroked his hair. 
 much do you love me?  (.) That much?  Okay.  (.) You're only having little bits.  (.) You're no
 Rach do you love me?  (.) Do you love mummy?  (.) Do you love nanny?  (.) No! [laugh]  

The responses to it provide many opportunities for discussion - learners could at the very least debate which response they would (not) prefer, engaging in significant amounts of communicative interaction in the process. The concordance also displays a number of features which might warrant further serendipitous exploration (for example, the phrase with all my heart), as well as potentially stimulating the user to find out more about certain situations by browsing the source texts.

4. Conclusions

The intriguing nature of this last concordance finds little echo in the literature as a whole, which reveals relatively little enthusiasm for the idea of giving teachers and learners direct access to large corpora. One particularly sceptical observation is the following: [...] simply dumping 200 million words of corpus data in front of people isn't going to be much help for most teachers and students. It takes time, commitment and some good software tools to become really expert in the analysis of this type of material. (Clear 1996: 27) In this paper I have illustrated four ways in which I believe learners can, with practice, use large corpora productively. This is not of course to say that large corpora constitute a panacea for all ills or for all learners, and Clear's comment raises a series of questions which merit serious reflection.

4.1 Who may access to a large corpus be useful for?

There is, I think, little doubt that a corpus like the BNC can only be used profitably by fairly advanced adult learners - as well as, of course, by teachers. The linguistic complexity of many citations and their relative unpredictability, given the limited context available in a one-line concordance display and the variety of texts contained in the corpus, mean that it is much more difficult to make sense of concordance lines than to consult a learner dictionary, grammar, or textbook. However, understanding does not necessarily have to be complete in order to be value, and learners can in most cases be left free to select those citations they are best able to make sense of. Unlike professional linguists, they are under no obligation to account for all of the data, or to do so in a manner which meets linguistic criteria of descriptive adequacy. Learning a language proceeds by progressive approximations, and partial generalisations derived from limited data are essential to that process, always provided that their partial nature is recognised (Aston 1995). What seems important is that the learner can make enough sense of the data, and draw conclusions of sufficient relevance, to maintain interest and motivation, so that interpretative skills have the opportunity to improve with practice. Working in pairs or small groups may help in these respects.

4.2 What are its limits?

While large corpora are rich resources, there are nonetheless limits to the kinds of information which can be obtained from them. These depend largely on the design and encoding of the corpus, and the software used to interrogate it. While an excellent source of lexical information, the BNC, for example, can only really be used to study a limited set of grammatical patterns, namely those which have distinctive lexical correlates. While it is easy enough to find all the occurrences of enjoy, and to sort them according to the part-of-speech category of the following word, it is impossible to find all cases of verbs followed by a gerund, since the SARA index does not include part-of-speech categories such as "all verbs" or "all V-ing forms". And not all lexical correlates are sufficiently unambiguous to allow them to be used in queries: any search for restrictive relative clauses would drown the user in irrelevant data, given the number of other uses of wh- pronouns and of that in the language (not to mention the impossibility of identifying relative clauses with pronoun deletion, as in the man I saw). Particular semantic and pragmatic categories (doubt, cognisance, disagreements, summaries, etc.) are difficult to locate for the same reason. Nor is the BNC the place to study many features of spoken discourse: transcripts are orthographic, paralinguistic features are only roughly indicated, and situational description is limited. This means, for example, that while one can compare speech by men and by women, one cannot compare speech to women and to men.

A large mixed corpus is also inappropriate for the study of highly specific text-types or genres, any one of which is unlikely to be adequately represented, and may not be recognisable from the encoding. There are very few business letters in the BNC, just as there are very few service encounters, and those wishing to explore their specific conventions would do better to compile a small corpus including only texts of those types. It should also be borne in mind that the BNC contains contemporary British English: those interested in other geographical or historical varieties should look elsewhere - though they might still want to use the BNC to carry out contrastive analyses.

4.3 How much training is required?

Large corpora are complex, as are the software programmes to interrogate them. It takes time and practice to learn how to formulate queries which will effectively find what one is looking for, without omitting too many relevant instances or including too many irrelevant ones. Our experience with undergraduate learners of English at Bologna University is that they need a minimum of eight hours hands-on instruction and a similar amount of individual practice in order to feel reasonably at ease with SARA and avoid the more obvious pitfalls in its use. The required training is not simply a matter of learning about the corpus and the software, but also one of learning how to learn from them. They will need practice in recognising patterns of collocation, colligation, semantic preference and semantic prosody, in hypothesising possible formal variants, and in watching out for associations with particular positions, texts or text-types, of users and user-types. They may also need to learn to make and to value partial generalisations of a relatively low-level nature. They may, for instance, need to learn to notice that the most typical thing to be rekindled is memories, rather than attempting a blanket generalisation to "past feelings" which hides this specific fact (Aston 1997). And they must learn to handle numbers, understanding what frequencies and differences may be significant, not so much in a statistical sense as in the more general one of always asking whether the numbers are large enough to warrant inferences.

4.4 Is it worth it?

Training takes time and energy, as does corpus use. There have as yet been no empirical studies to show whether corpus-aided activities of the kinds I have outlined here are worth the investment in terms of results. From a theoretical perspective, however, it can be hypothesised that the on-stage use of a large corpus might have the benefits listed below, whose extent would seem to make further research and experimentation desirable.

  • The size and variety of the corpus, and of the learning activities it offers, make it a virtually inexhaustible source of information about the language, and a highly motivating one for many learners.
  • The availability of multiple examples from which the learner can select memorable aspects, making generalisations at various levels, should make for effective inductive learning; it also allows deductive testing of a wide range of hypotheses to be carried out.
  • The need to design queries which will obtain appropriate data, and to analyse and categorise that data satisfactorily, should develop language awareness.
  • The corpus can also be a rich source of encyclopaedic information.
  • Analysing data may develop reading skills, particularly of a bottom-up nature, as the user attempts to infer meaning from the limited context provided in concordance citations. (The importance of such skills seems underestimated by current approaches to second language reading, which principally stress top-down prediction.)
  • Corpus use can generate much communicative interaction, both in elaborating queries and manipulating solutions, and in discussing and reporting on findings.
  • Inasmuch as learners can use corpora to learn the language for themselves, and develop their sensitivity to language and refine their learning strategies in the process, corpora are instruments which can further autonomy.
  • Large corpora can make learners less dependent upon other sources of information, such as teacher and textbook, and indeed place them in a position to critique their statements. This may modify learners' relationships with their teacher and textbook, as the latter pass from a role of linguistic authority to that of facilitating the learner's own processes of research and discovery.
  • Notes

    1. The BNC was developed by Oxford University Press, Longman, and Chambers Harrap, along with Oxford University Computing Services, UCREL at Lancaster University, and the British Library. It can currently be accessed in the following ways:

  • purchasing the corpus and installing it on a UNIX server, where it can be consulted over a local network with either the specially-designed SARA program or any other appropriate interrogation software. Rather than the corpus itself (ś220), the major cost is the necessary 5 gigabytes of disk space.
  • subscribing to the on-line service at the British Library, where the corpus can be consulted over the Internet using the SARA Windows client (ś60 p.a.).
  • using the free web service at the British Library: this only provides a maximum of 25 concordance lines, and offers more limited query and display options.
  • For further details, see the BNC web pages at http://info.ox.ac.uk/bnc

    2. SARA (SGML-aware retrieval application) was developed by Tony Dodd. It can be downloaded free of charge from the BNC web site at http://info/ox.ac.uk/bnc

    3. 307539 utterances by female speakers, 304278 utterances by male speakers.

    4. Since a high proportion of spoken occurrences of pretty in the BNC have ambiguous part-of-speech tags, where the automatic tagging program was uncertain whether use was adjectival or adverbial, these figures are based on a manual analysis.


  • Aston, G. 1995. "Corpora in language pedagogy: matching theory and practice". In G. Cook and B. Seidlhofer (eds), Theory and practice in applied linguistics. Oxford: Oxford University Press. 257-270.
  • Aston, G. 1997. "Enriching the learning environment". In Wichmann et al. 51-64.
  • Aston, G. and L. Burnard 1998. The BNC handbook: Exploring the British National Corpus with SARA. Edinburgh: Edinburgh University Press.
  • Bernardini, S. 1997. "A trainee translator's perspective on corpora". Online. http://www.sslmit.unibo.it/cultpaps/trainee.htm
  • Bernardini, S. forthcoming. Competence, capacity, corpora. Bologna: CLUEB.
  • Clear, J. 1996. "On-line dictionaries: the way ahead?" IATEFL newsletter, 130. 27.
  • Ellis, N.C. 1996. "Phonological memory, chunking, and points of order". Studies in second language acquisition, 18. 91-126.
  • Flowerdew, J. 1993. "Concordancing as a tool in course design". System, 21. 213-229.
  • Flowerdew, J. 1996. "Concordancing in language learning". In M. Pennington (ed), The power of CALL. Houston: Athelstan. 97-113.
  • Johns, T. 1991. "Should you be persuaded: two examples of data-driven learning". In T. Johns and P. King (eds), Classroom concordancing. ELR Journal, 4 (special issue). 1-16.
  • Johns, T. 1994. "From printout to handout: grammar and vocabulary teaching in the context of data-driven learning". In T. Odlin (ed), Perspectives on pedagogical grammar. Cambridge: Cambridge University Press. 293-313.
  • Jordan, G. 1992. Concordances: Research findings and learner processes. Unpublished MA thesis. University of London Institute of Education.
  • Krishnamurthy, R. 1996. "Ethnic, racial and tribal: the language of racism?" In C.R. Caldas- Coulthard and M. Coulthard (eds), Texts and practices. London: Routledge. 129-149.
  • Leech, G. 1992. "Corpora and theories of linguistic performance". In J. Svartvik (ed), Directions in corpus linguistics. Berlin: Mouton De Gruyter. 105-122.
  • Mindt, D. 1997. "Corpora and the teaching of English in Germany". In Wichmann et al. 40-50.
  • Murison-Bowie, S. 1993. MicroConcord: Manual. Oxford: Oxford University Press.
  • Prabhu, N. 1987. Second language pedagogy. Oxford: Oxford University Press.
  • Sinclair, J. McH. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.
  • Skehan, P. 1998. A cognitive approach to language learning. Oxford: Oxford University Press.
  • Stubbs, M. 1996. Text and corpus linguistics. Oxford: Blackwell.
  • Wichmann, A., S. Fligelstone, A. McEnery and G. Knowles (eds) 1997. Language corpora and teaching. London: Longman.
  • Widdowson, H.G. 1989. "Knowledge of language and ability for use". Applied linguistics, 10. 128-137.
  • Willis, J.D. 1990. The lexical syllabus. London: Collins.
  • Willis, J.D. and J.R. Willis. 1987. Collins Cobuild English Course. London: Collins.