Let’s talk about language

Dear Reader, We’re very glad you’ve joined us to celebrate something of a milestone for Cam Lang Sci: it’s our 100th post today! We’ve reached our centenary via adventures with I, you and she, and s/he; puns and whistles; syntactic islands and waves of language change, and much besides. We’ve asked, What’s a word? What’s in a name? And do happiness and sadness taste like sweet and sour chicken?

CC Rsa

CC Rsa

Looking back over all the posts, I can say that while the precise topics reflect our own research and personal interests, I think we’ve managed to at least dip our toes into the six foundational areas of linguistics – phonetics, phonology, morphology, syntax, semantics and pragmatics – and sometimes we’ve dived right in! We’ve also seen different methodological currents like theoretical syntax, corpus linguistics, experimental work in a phonetics lab and in a classroom. There’s still much we haven’t covered, though, and as new students join our merry band we hope to test the waters of many other areas in the coming months.

As well as trying to share what we find interesting about linguistic research and satiate the curiosity of our readers on language (and to be honest, who isn’t fascinated by one of the defining characteristics of human nature?), in this blog we’ve hoped to provide a more varied linguistic diet than what might be served up by the media in general. But what do they tend to report on?

Over the past few years I have conducted a bit of a survey of reports on language in the media. Well, to say it was a survey is too strong a phrase. What I did was set up a Google news alert for ‘language’, and when I had time (most but certainly not all days) look at the headlines in the world press (and sometimes blogs as well) that it happens to select for me, and if any look ‘linguisticky’ enough, take a look and add the weblink to my file. Calling it a ‘convenience’ sample would only be flattery.

I’ve now got 72 entries, and it’s rather interesting to see how they’re distributed around linguistic space when you categorise them in a broad-brush way. To my surprise, there weren’t so many that were concerned with topics that I might have expected to have been attractive to the sensationalising media: language evolution crops up four times, prescriptivism or prescriptivist views five times, and linguistic relativity only a couple. What people are interested in (or what journalists think people are interested in) is perhaps also unsurprising, though, when you think about it: it’s the things that we attach personal value to. For example, everyone cares about how and when their child learns to talk (as eavesdropping at any toddler group will tell you), and acquisition is a popular topic with 8 articles. Likewise, a related group of topics about national language policy, education language policy, and language change are also popular (13 hits). Whether it’s preserving endangered languages, the rise of global English or the need for Brits to get into learning foreign languages, when language meets politics, people are likely to start feeling hot under the collar. And of course sociolinguistics also gets a good look-in (with 7), as the way we speak is in itself something we really value: our traditional dialects, the rise of multilingualism, or gender differences. It’s also encouraging to see sign languages featuring prominently (with 6 articles to their name) reflecting the growing awareness of sign languages as full and fascinating linguistic systems.

There’s a noticeable though perhaps understandable lack, however, in coverage of more theoretical aspects of linguistics. Though phoneticians, phonologists, syntacticians and semanticists may have to work harder to demonstrate the relevance of their work to those beyond their field and way outside it, it can be massively beneficial when they do – for the researchers as well as the readers. They are, after all, trying to understand the very nitty-gritty, the nuts and bolts, of how language works.

And that’s perhaps exactly where Cam Lang Sci and its friends can step into the breech. Here’s to the next 100!

Breaking news: Guy who learned Japanese from girlfriend speaks like a girl

The life of a Japanese learner is not an easy one: you’re faced with not one, but three, non-Roman writing systems, an array of politeness forms, and freaky word order options. To top it all off, the language learning community swarms with warning examples of how to make a fool of yourself by not only making simple grammatical mistakes but also *Psycho tune* using the language of the wrong gender. The stories of ‘Guy who learned Japanese from girlfriend now speaks like a girl’ or ‘Girl shunned for using male language’ could make Daily Mail headlines were the publication more linguistically inclined.

... and speak accordingly! Image credit: Beth Granter.

… and speak accordingly! Image credit: Beth Granter.

Although reality isn’t quite as much of a minefield as a cheeky Google search for ‘why is learning Japanese so difficult?’ might suggest, gendered language is a very real phenomenon in Japanese. Gendered language is nothing grammatical in this case, and is separate from grammaticalised aspects such as gender-specific or neutral pronouns: if you use a form strongly associated with the opposite gender, your utterance won’t be deemed ungrammatical, just weird or out of place. Rather, it refers to gender roles and ideologies of what female and male speech should sound like; very broadly, female language tends to be more submissive and gentle, male language being more direct. Indicators of gender are scattered throughout the language, showing up in choices of words, interjections (things like oh, uhm), directives (i.e. commands, requests, and questions), pronunciation, and so on, but most prominently in the choice of sentence endings and pronouns referring to ‘I’ and ‘you’.

Take sentence endings first. Japanese is full of particles – a bit of a dustbin category for little word-like elements that don’t always mean much on their own – and many of these appear sentence-finally, expressing things like questioning or affirmation: think ‘this is nice, isn’t it’ type of things. One of the most clearly gendered expressions here is wa: as a sentence-final particle, it indicates the femininity of the speaker. A girl would typically say takai-wa (‘tall’), but the same utterance for a boy would be ridiculed as effeminate, the socially prescribed option being plain takai. On the opposite end of the scale is zo indicating new information and used exclusively in male speech. It is considered informal and even rude, mirroring the directness ideologically associated with male speech. A gender-neutral way of expressing a similar meaning is the particle yo.

Where things get slightly more puzzling for a Western learner is the proliferation of words referring to ‘I’ and ‘you’. Some of them relate to degrees of politeness and differences in social status, but many of them encode additional aspects of gender. Gender-neutral choices are exemplified by the Japanese class favourite watashi ‘I’. Typically feminine pronouns are again perceived as softer and gentler; these include atashi, atakushi, and uchi, while typically male pronouns feature boku and ore. As for referring to ‘you’, male forms tend to be more direct – kimi, omae, anta. Feminine counterparts encode a greater degree of politeness, so that a typical form of address comes in the form of the pronoun anata followed by the addressee’s name or title and a socially appropriate marker.

Boku? Wabash? Ore? Who am I? Image credit: myrealnameispete.

Boku? Wabash? Ore? Who am I? Image credit: myrealnameispete.

Forms differ, then, but whether there is a yo or a zo at the end of a sentence doesn’t say much in itself to a non-Japanese aficionado. Where things get interesting is when these funny little word forms are considered in the broader social context (cue gender studies students!).

Slightly archaic as it may sound with its submissive feminine and direct masculine forms, gendered language as it is conceived of today is in fact a relatively recent innovation. This goes also against the popular depiction of gendered language as an ancient innovation, a case of this-is-the-way-it-has-always-been. Although differences in male and female speech have been recorded earlier as well (and this is not surprising; even in languages like English where ’gendered language’ is not made into a big deal for learners, speakers will think, perhaps unconsciusoly, of certain ways of speaking as typically feminine or masculine), gendered language proper kicked off after the start of the Meiji era (from mid-19th century). Something of a celebrity among Japanese linguists, Orie Endo compared two literary works, Ukiyoburo from 1813 and Sanshiro from 1909 to show the timescale the modern gender differences emerged along. In the earlier text, the differences in speech patterns reflect social status, but not gender, while in the later one gendered differences have clearly emerged.

Of course, particles, pronouns and the like don’t just turn into carriers of gendered meanings in a vacuum: as always in language change, there is a human component. At the start of the Meiji era, schoolgirls, as teenagers so often do, came under criticism from societally higher-up men for speaking ‘improperly’, in ‘vulgar’ or ‘unpleasant’ way (déjà vu? My earlier post on be like, might, like, bear like a resemblance to this). But as sometimes happens to schoolgirls, they grow up and take on positions of role models. At the time, there was an ideal of ryoosai kenbo ’good wife, wise mother’ hanging around that was supported by the government and featured in women’s magazines, written about by the very schoolgirls in the very language they had been criticised for. The form of language became associated with the ideal middle class and was therefore something to aspire to; and voilà, the parlance of vulgar schoolgirls had become the new vogue.

Babbling away in improper Japanese. Image credit: Danny Choo.

Babbling away in improper Japanese. Image credit: Danny Choo.

The establishment of the new feminine language, or onna kotoba, was further propelled by reactions to the rapid modernization and westernization processes that Japan was undergoing: the nation needed traditions to hold on to, and gendered language made Japanese conveniently unique compared to the incoming western influences.

That is not to say that after its establishment gendered language has become inert to change. Quite on the contrary, recent developments see female and male speech losing their distinctness. Young women have been reported to have stopped using feminine speech in favour of more neutral or even masculine language, with teenage girls taking over traditionally male pronouns such as boku and ore. Some male forms are taking on a function of female empowerment: miki ’you’, usually used by men to close women friends, is now also used by women to talk down to men. The linguistic changes can, again, be tied to cultural shifts: more women than ever before are now delaying marriage and pursuing careers, and a speech form intended to convey submissiveness does not fit well with this emancipation of sorts. Interestingly, self-defining male speakers are not taking on features of female speech, and this would seem to be so engrained into the gendered mindset that it does not happen even in soliloquy, or speaking alone. So, while women happily use masculine forms even when blabbering alone, men don’t use feminine forms in the same way. This, some would argue, reflects the greater value associated with the masculine gender image in social hierarchy.

It's (linguistic) emancipation time! Image credit: DonkeyHotey.

It’s (linguistic) emancipation time! Image credit: DonkeyHotey.

But with that, I’m treading into non-linguistic waters. So, students of gender studies, rejoice – if you took in anything of the above, you are sorted for research topics.

Students of Japanese, on the other hand, relax – gendered language is becoming less and less of an issue for your learning process, and your gender-mismatched speech is unlikely to make a headline.


I said this was a hot topic, and the internet in particular is full of thrilling reading. I’ve drawn inspiration, examples, and information from Tofugu, Oxford Dictionaries, The Japan Times, LinguaLift, and Japanese – a linguistic introduction by Yoko Hasegawa.


Syntactic Islands

Last week’s post on movement highlighted just how useful it can be to think of elements in a sentence being able to move to different positions.

One of the really interesting things about movement is that it seems to be unbounded. In other words, there are apparently no bounds to how far an element can move (I say seems and apparently because there is a lot of evidence to suggest that the situation is far more complex. However, I’ll ignore those details here). We can see this unboundedness in so-called wh-movement (it’s called wh-movement because the moving element undergoing this type of movement typically begins with the letters wh– in English, e.g. who, what, where etc.). In (1b), the wh-phrase what is interpreted as the direct object of the verb see. Since direct objects in English normally follow the verb, as in (1a), what is also thought to originate in this position (I’ll indicate this original position with what in strikethrough, indicating that it is not pronounced).

(1) a. You saw something

b. What did you see what?

The interesting thing is that what can appear arbitrarily far away from its original position.

(2) a. What did you see what?

b. What did he say that you saw what?

c. What did she think that he said that you saw what?

d. What did they believe that she thought that he said that you saw what?

e. …

However, the story is much more complicated and interesting. In his 1967 PhD thesis, John Robert ‘Haj’ Ross identified various syntactic ‘islands’. Syntacticians generally take ‘islands’ to be units of structure that elements cannot escape or move from.

We saw in (2) that a wh-phrase can apparently move as far away from its original position as it wants. But now consider the following sentence:

(3) a. I met the man who saw a ghost.

b. I visited the house that you saw a ghost in.

The examples in (3) contain relative clauses (surprise, surprise! See my other posts) – who saw a ghost is a relative clause modifying the noun man in (3a), and that you saw a ghost in is a relative clause modifying the noun house in (3b). In (2), we attempted to move a wh-phrase which originated as the direct object of the verb see. As we saw, the result was a well-formed English sentence. So let’s try to do the same thing with the examples in (3).

(4) a. *What did I meet the man who saw what?

b. *What did I visit the house that you saw what in?

The examples in (4) are crashingly bad English sentences (hence the *)! In fact, if I’d put these sentences at the beginning of this post, you’d probably be wondering what on earth I was trying to say. But what’s wrong with them? What’s the difference between the examples in (2) and the examples in (4)?

As Ross observed, the problem with (4) is the relative clause. The relative clause seems to be an island, i.e. wh-phrases cannot escape from it.

There are other types of island beside relative clauses. Consider the example in (5) which involves two conjoined (or co-ordinated) direct objects.

(5) a. You saw a ghost and a monster.

b. *What did you see what and a monster?

c. *What did you see a ghost and what?

As (5b) and (5c) show, we cannot move out of co-ordinate structures (Ross called this the Co-ordinate Structure Constraint).

Relative clauses and co-ordinate structures seem to be very strong islands, i.e. if we attempt to move an element out of such islands, the result is very bad (given how much I’ve worked on relative clauses, I’m in two minds about whether I’m stuck on them because they are strong in the sense of an island paradise which you never want to leave, or in the sense of Alcatraz!).

Other structures seem to be weaker islands, i.e. we can move elements out of them, but the result is not quite fully acceptable (this is marked with a ? at the beginning of the example). An example of a weak island can be seen in (6b) (compare it to (6a), which does not contain an island).

(6) a. What do you think that I saw what?

b. ?What do you wonder whether I saw what?

The island effect seems to come from the fact that we are trying to move an element out of a subordinate clause beginning with whether. Similar effects are found with subordinate clauses beginning with how, where, who(m), what. They are thus called wh-islands because these islands are introduced by elements typically beginning with wh– in English.

(7) a. You asked how I fixed the car?

b. ?What did you ask how I fixed what?

Although it has been nearly 50 years since Ross first identified his ‘islands’ (and there are many more that I have not mentioned), they continue to pose problems for syntactic theory. A major step was to identify the islands in the first place. This shows how important it is to consider not only what languages can do, but also what they can’t (there’s also the massive question about how we intuitively know that sentences such as those in (4) and (5b,c) are bad). The next step was to understand what makes an island an island (and whether all islands are in fact alike). We can list them and classify them as strong or weak, but ideally we’d want to know why these structures are islands and not others. Attempts have been made (notably by Chomsky (1973), see also the recent overview of the issues by Boeckx (2012)) but the problem still remains.


Boeckx, C. (2012). Syntactic Islands. Cambridge: Cambridge University Press.

Chomsky, N. (1973). Conditions on Transformations. In Anderson, S., & Kiparsky, P. (eds.) A Festschrift for Morris Halle (pp. 232-286). New York: Holt, Rinehart & Winston.

Ross, J.R. (1967). Constraints on Variables in Syntax. Doctoral dissertation, MIT.


Moving things moving around

Image taken from http://www.geograph.org.uk/photo/4203581. Copyright © Bob Harvey (http://www.geograph.org.uk/profile/8272) and licensed for reuse under this Creative Commons Licence.

It has recently come to my attention that – although we’ve made numerous references to the issue – we don’t seem to have had a proper post on this blog devoted to one of the most important and central* ideas of modern syntactic theory: movement.

Take, for example, the sentence Are you a cat? Now, normally in English verbs come after subjects, e.g. in the statement You are a cat. A good way of looking at the differences between the statement and the question is to say that, in the latter, the verb has moved from its usual position after the subject to a position at the start of the sentence. Syntacticians like to represent sentences using so-called “tree diagrams” (we needn’t go into the reasons for this here) and the one for Are you a cat? looks something like this:


The arrow here indicates the movement and I’ve “struck out” the lower copy of are to show that it’s not pronounced.

Can things other than verbs move? Of course they can. Compare The cat sipped the milk with The milk was sipped. In both cases, the milk is semantically the “object” of the verb sipped – the same thing happens to it in both sentences – but in the second (“passive”) sentence it appears in the subject position! A nice way of accounting for this is to say that it, too, moves to a higher position in the sentence. (Obviously we still have to account for things like the appearance of was, what’s happened to the cat etc., but those are somewhat separate issues.)


Another place we see movement is in sentences like What film shall we watch? Semantically, what film is again the object here, and we know objects ordinarily follow verbs in English, so again we can say that it’s moved from the end of the sentence to the start.

Potentially a big advantage of accounting for these things by movement is that it allows us to unify our explanations of what’s going on in all these different types of sentences: we can say they are all instances of a single phenomenon, movement, rather than having to come up with separate explanations for each case.

There’s been a huge amount of work on the theory of movement and it’s proved very profitable; it seems to tell us a great deal about language. An interesting thing that has come out of this work is that there seem to be restrictions on movement: you can’t, in practice, just move anything anywhere. For example, it appears that across languages movement always or almost always goes up the syntactic tree, not down it. In English, this means something can move leftwards in a sentence (as in the examples given in this post) but not rightwards – so you never get sentences like You are a cat are where the verb moves to the other side of the object.


(A challenge for the reader: can you come up with any apparent counterexamples to this claim that we don’t get movement down the tree / to the right in English?)

I personally think the idea of movement is one of the best insights to come out of linguistic theory; it’s truly impressive how much it can tell us about language. You only need to read some of our other posts on this blog – try looking under the “syntax” tag – to see just what a wide range of data it’s helpful in explaining.

* Caveat: many approaches to syntax do reject the idea of movement. This seems wrong-headed to me, as they still have to come up with some way of accounting for the facts discussed in this post, and movement is arguably the most straightforward way of doing so.

The words and sentences not taken

Two roads diverged in a wood, and I—

I took the one less traveled by,

And that has made all the difference.

— Robert Frost, The Road Not Taken

Ask Chris: Last week I watched a TV interview with a famous Chinese novelist. I pretty much enjoy most, if not all, of his works, but I found that his speech in the interview was not as fascinating as his novels. That reminded me of my best friend, who is really good at telling us her stories but can never write good articles. I believe that we use the same language in speaking and writing – but how could it be? Is there any difference in the use of language when we speak and write? Is the difference only limited to Chinese?

(Note by Chris: This blog involves the development of Chinese language since the original question is asked by a Chinese netizen, but I hope it will not bother most of the readers. If you find it difficult, please imagine that you are in the Middle Ages when the common written language was Latin.)


Chris answers: Of course we use the ‘same language’ when we speak and write if we are talking about the general system of information coding. However, if the language has a good history of written records, or it has been used on some formal occasions, it will develop two sub-systems: the spoken system and the written system. This phenomenon is not limited to Chinese: English, Japanese, and other well-known and less-known languages all have the two-sub-system phenomenon, so it is possible for the native speakers of any language that a good speaker is not a good writer and vice versa.

Although structuralism is not the current trend in the field linguistics, it is still very useful when we analyse a language as a comprehensive system of symbols. When we define a language, we need to define all the possible symbols that can be used, and all the possible rules and principles of the combination of symbols; these two form the entire system of a language. However, it is not the case that any element or rule of the system can be used anywhere: we prefer some elements and rules in the spoken discourse and others in the written discourse, and such preferences lead to two subsets in the system of symbols, where the differences between spoken and written language lie. In general, the elements of the two subsets are pretty much shared, such as the sound patterns (we call them ‘phonological rules’), the word-formation rules (we call them ‘morphological rules’), the default word order, a number of lexical items, and some pragmatic rules, but there are exceptions – as we will see.

Please allow me to take Chinese and English as examples. Looking at the history of Modern Chinese, some people have the impression, or what I would call misconception, that Modern Chinese is merely a spoken language because it originates from Vernacular Chinese (which is literally called ‘plain speech’ in Chinese). That is not true, though. When it was born, Vernacular Chinese was in contrast to the formal written Classic Chinese, and there was a time when this variety was only used in spoken discourse. But with the development and change of Vernacular Chinese, it generated a written system and several literary traditions prior to the birth of Modern Chinese, and one famous example is Dream of the Red Chamber in the Qing Dynasty. The words and sentences used in Dream of the Red Chamber were not exactly the same as those used in the daily spoken discourse at that time. Similarly, you will find that current works of Chinese literature make use of words and expressions that are rarely seen in our daily conversations.

Maybe you would like to argue that the differences exist in Chinese only due to its long history, and life will be simpler if we move to English. Sadly, I am going to tell you that this is not the case. Below is a sentence that I randomly selected from a paper in my hands. I believe it is totally different from the sort of conversation you might hear between me and my friends:

When many different networks are generated in a process of simulated evolution, certain types of modular architectures are selected as “highly fit” in that they are particularly efficient at solving a given learning task.     (Jaap M. J. Murre, Models of Monolingual and Bilingual Language Acquisition)

This sentence is quite different from our daily chitchat in several aspects. The range of words is rich, and the lexical items are formal (you can judge this from their length), and the content words (nouns, verbs, and adjectives) are dense. The sentence structure is more complicated, since it includes several relative clauses, and the modifiers are longer. These distinct features of written academic English mark the genre as separate from other genres, and that is exactly the reason that some international students with relatively good speaking skills are required learn ‘how to write academic English’ after they enrol on a university-level course.

In a word, there are essential differences between the syntactic, semantic, and pragmatic aspects of the spoken sub-system represented by our daily conversations and online chatting, and those of the written sub-system, represented by works of literature, academic essays, manuals, and documents. Actually, these differences are the targets of some linguistic research areas, such as stylistics, discourse analysis, and sometimes also sociolinguistics.

Now let’s move back to your first question – why do people perform differently when they speak and write? The divergence between the spoken and written sub-systems creates a problem: if you use any spoken element when you write, or vice versa, your audience will feel awkward. If a lot of spoken elements are used in the written language, the audience may feel that the author has a shortage of vocabulary and the content of her writing is too shallow. On the other hand, when written elements appear frequently in a piece of spoken discourse, the audience may be easily bored by the long sentence structures and difficult wording. The feeling that ‘the speech and articles by the same person are very different’ may be due to the presentation manners of the person, or the mismatch between the sub-system and the context of discourse.

Why do we have such a feeling when we notice a mismatch? Since I am working on language acquisition and processing, my instinct is that the reason may lie in the mechanism of human language processing. When we process spoken language, we always do it linearly: a piece of speech is always continuous and most of the time we do not pause or backtrack – considering the history of human technology, this was totally impossible when language first appeared. The information in the spoken discourse is continuously pushed into our processing system, and in order to catch the following bits, we do not have enough time to reconsider the hidden message of a particular word or phrase. Moreover, we may even encounter difficulties when hearing a less frequently used word in spoken language. Therefore, when processing spoken discourse, we expect the message to be clear and easy to understand.

Reading is another story. The most prominent feature of reading is that we can control the time of attention at one particular point, and we can backtrack to previous information. That is because the written information is not ‘pushed’ at us. Maybe we do not really feel that, but the movement of our eyes when we are reading does not always move strictly forwards. It has been discovered in eye-tracking studies that around 10% to 15% of eye movement is backward when we read (see reference), which means that we are going back to review some information in previous constituents; this feature is called ‘regression.’ If the text is difficult because of the rare lexical items, complex syntactic structures, or grammatical errors, people will stop moving their eyes and gaze at a particular point (which is called ‘fixation’), or they will perform more regressions. Below is a typical illustration of eye movement when people read from Eye-Tracking While Reading – Kertz Lab – Brown University Sunset Wiki.


At the same time, it should be noted that some high-level semantic and pragmatic processing does not always occur simultaneously with receiving information. This is more obvious when we appreciate rhetorical elements as well as reading literature. Such processing is called ‘non-spontaneous interpretation.’ Usually, it requires more cognitive efforts and processing time, and sometimes readers are even required to re-evaluate the information they have received from the preceding text and simulate the intentions of the author. We can hardly perform such processing when we listen to spoken discourse because stopping processing at any point would mean missing the ongoing flow of information.

All these differences in processing mentioned above will in return influence the word choices, sentence structures, and information organisation we use in spoken and written discourse. Since the processing of spoken language is quick, plain, and linear, we will make our speech short, direct, clear, and easy to identify, while the pauses and backtracking in the processing of written language allow us to add some complicated sentence structures, rare words, and rhetorical methods. If we organise the information as it is in spoken discourse when we write an article, the amount of information in each sentence will decrease, and thus the article is too shallow; in contrast, if we speak in the way that we write, there will be too much information to process, and the lack of non-spontaneous interpretation will also influence our feelings towards the discourse.

That is more or less the full story, and I hope you enjoyed this piece of my writing and all the information included in it. Unfortunately, if you ask me your questions face to face, what I will do is to recite the whole text to you – yes, I am indeed the kind of person who is better at writing than speaking. I knew this when I was still in kindergarten. I knew.


For more information about language processing in general, please refer to the following articles, and you can always stop and backtrack:

Furlong, Anne. “The soul of wit: A relevance theoretic discussion.” Language and Literature 20.2 (2011): 136-150.

Rayner, Keith, and Charles Clifton. “Language Processing in Reading and Speech Perception Is Fast and Incremental: Implications for Event Related Potential Research.” Biological psychology 80.1 (2009): 4–9.

Oh don’t be such a snob

We all know the type. Enjoying a Netflix and chill session, you innocently comment on whatever you happen to be watching – “Geez, George Clooney’s manliness is so different than Orlando Bloom’s” – when your grammar snob of a friend’s eyes light up. “Different to”, they hiss viciously. Or you’re offering a cutting-edge analysis of the current political situation over lunch – “…according to Merkel, who I disagree with” – when your solution to the Brexit crisis is interrupt by “Ahem, you mean with whom I disagree.” Yes, they get everywhere: in the Ecuadorian capital Quito, radical grammar pedants have created a concept of ‘orthographic vandalism’, correcting the grammar of Quito’s graffiti.

A few weeks back, Mona Ghalabi launched on a rant against grammar snobs on the Guardian. They use “elite and increasingly outdated form of English language”, believing that “language evolves but grammar doesn’t”, and are, quite simply, “patronizing, pretentious and just plain wrong.” “If I look around a room and say there are less people here than I expected”, Ghalabi says, “does it really need to be pointed out that because people can be counted, I should have said there are fewer people here?”

No, it absolutely does not. I want to join forces with Ghalabi and deliver the final blow to grammar snobs: I present to you three ways in which allegedly substandard speech is, in fact, not a disgrace but a linguist’s goldmine.


Beware, grammar snobbery ahead! Tim Lawrenz.

Now, sometimes speakers utter things that are quite frankly errors even by non-snobbish speakers’ standards; everyone has the occasional slip of a hunk of jeep instead of a heap of junk. As random as they may seem, these types of errors are in fact constrained by the phonological structure of the language in question, and as such can tell observant linguist something about it. In English, sphinx in the moonlight becomes minx in the spoonlight and never features the expected sfoonlight (what kind of deep, poetic conversation these examples would occur in, I don’t know). The reason is simple: native English phonology does not allow syllables beginning with sf, except in loanwords. And so, apparently innocent slips of the tongue turn out to allow insight into the psychological structures and processes of generating speech.

Things get even more interesting when you throw in an extra language. Bilingual children – and adults, as a matter of fact – sometimes mix words and structures from their two (or more) languages. For uninitiated this may seem like a terrible failure to gain competence in either language. However, code mixing is very much not random, and it can be shown how language-specific grammatical constraints are at play in at first sight substandard mixes. In child French, so called weak pronouns (je, tu, il,…) can appear with finite verbs only (i.e. verbs that can function as the root of an independent clause: I like cake has a finite verb, I liking cake does not), while strong pronouns (moi, toi, lui) can appear with both finite and nonfinite verbs. French children might then utter sentences like Moi pousser (‘Me pushing’) with strong pronoun and a nonfinite verb but never Je pousser with a weak pronoun. English, on the other hand, has only strong pronouns, so that English children can happily say things like I washing.

Baby speakers - cute and informative. What else could a linguist ask for? Mulan.

Baby speakers – cute and informative. What else could a linguist ask for? Mulan.

Curiously, when French-English bilinguals use pronouns and verbs from different languages in one sentence, these constraints are still obeyed. I pousse lá (‘I am pushing there’) with an English pronoun and a finite French verb and They manger bonbon (‘They eating candy’) with an English pronoun and nonfinite French verb are attested, as is Moi play thing with a French strong pronoun and English nonfinite verb. But what French-English bilinguals never produce is precisely the case ruled out by French grammar: a French weak pronoun with an English nonfinite verb, as in Je find it.

For the final case, I would actually like to offer a special thank you to prescriptive grammarians, or grammar snobs. Their writings can provide evidence of how people spoke in the past: the types of texts that are preserved from several centuries ago rarely reflect the language used by the illiterate part of the population, in many cases the vast majority, so that linguists have to rely in forms of indirect evidence. The first English grammar books appeared in the 18th century, pioneered by Robert Lowth’s A Short Introduction to the English Grammar and Lindley Murray’s equally imaginatively titled English Grammar. Both Lowth and Lindley were notorious prescriptivists who held a firm belief that Latin and Greek are superior to English. Their natural conclusion was that because Latin and Greek happen to be relatively highly inflected languages, English should be, too. This mission of making English as worthy as the ancient languages is most famously encoded in the fight for whom instead who in positions other than the subject.

While the ideological grounding is in its bizarreness intriguing in itself, for the modern linguist the relevant fact is that the allegedly correct use is pointed out in these grammars at all. If all speakers had been using whom, and thus living up to Ancient standards, there would have been no need to correct anything in the grammar. So, the early grammarians snobbish efforts tell later linguists that people were using who over whom already in the 18th century.

Break bad, break the rules, and linguists will thank you. Chapendra.

Break bad, break the rules, and linguists will thank you. Chapendra.

“We should spend more time listening to what others have to say and less focusing on the grammar they say it with”, Ghalabi appeals. As a linguist, I disagree. We should focus on the grammar people speak with – but not so much on the grammar people claim we should speak with.

(Come to think of it, Ghalabi does have a point even in her last statement: please don’t interrupt my Netflix and chill just to point out how my slips of the tongue might inform the world about my mental processes. Thank you.)


If you fancy reading more about bilingual grammars, the pronoun study, and much more, can be found here:

Paradis, J. and Genesee, F., 1996. Syntactic acquisition in bilingual children: Autonomous or independent? Studies in second language acquisition, 18(1), pp.1-25.

What a lot of relatives you have!

English has a lot of relatives. I don’t mean languages to which it is related, but rather relative clauses. I’m only going to focus here on some so-called restrictive relative clauses. An example is given in (1) (the relative clause is underlined).

(1) The wolf that ate grandma was in bed.

In (1), the relative clause helps us to identify which wolf we are referring to, i.e. out of all the wolves in context, we are referring to the one that ate grandma. In other words, the relative clause in (1) restricts the referent of the noun modified by the relative clause, in this case wolf.

There are quite a few types of relative clause which can be used to restrict the referent of a noun. Some of them look quite similar to one another but they behave in slightly different ways as we will see.

First of all, there are relative clauses introduced by relative pronouns (who or which) and those introduced by that. Let’s call them wh-relatives and that-relatives respectively.

(2)   a.  The wolf that ate grandma was in bed.

b. The wolf which ate grandma was in bed.

The noun modified by a wh-relative or a that-relative can correspond to a number of different positions inside the relative clause. In (2), for example, the noun wolf corresponds to the subject of ate. However, it could correspond to the object, like in (3), or the object of a preposition, like in (4), as well.

(3)   a.  The wolf that we saw was in bed.

b.  The wolf which we saw was in bed.

(4)   a.  The wolf that Red Riding Hood talked to was in bed.

b.  The wolf which Red Riding Hood talked to was in bed.

c.  The wolf to which Red Riding Hood talked was in bed.

Some people would say (4b) is not correct because it has a stranded preposition, and that (4c) is the correct version. However, we are interested in what English speakers actually do, not what some people think they should do. Interestingly, if we use a that-relative, like in (4a), we have no choice but to strand the preposition! (5) is not even acceptable to English grammar pedants! (* means unacceptable/ungrammatical).

(5)  *The wolf to that Red Riding Hood talked was in bed.

Big Bad Wolf

English also has restrictive relative clauses introduced by neither a relative pronoun nor that. Let’s call these zero-relatives because there is nothing (zero) visible/audible to introduce them. The noun modified by a zero-relative can correspond to an object or the object of a preposition in a relative clause. Some examples are given in (6).

(6)   a.  The wolf we saw was in bed.

b.  The wolf Red Riding Hood talked to was in bed.

So far, zero-relatives look just like wh-relatives and that-relatives except that the relative pronoun or that is missing. However, there is another difference. We saw in (2) that the noun modified by a wh-relative or a that-relative can correspond to the subject of the relative clause. However, this is not possible when the noun is modified by a zero-relative.

(7)  *The wolf ate grandma was in bed.

In (7), the intended meaning is the one where wolf corresponds to the subject of ate. However, (7) is unacceptable/ungrammatical. To express this meaning, we would need to use a wh-relative or a that-relative instead.

We have seen that a noun modified by a zero-relative cannot correspond to the subject of a relative clause. There are other restrictive relative clauses where the modified noun can only correspond to the subject. These are the so-called reduced relatives.

(8)   a.  The wolf eating grandma has such big ears, eyes and teeth.

b.  The person eaten by the wolf was grandma.

They are called reduced because they seem to be reduced versions of wh-relatives or that-relatives.

(9)   a.  The wolf which/that is eating grandma has such big ears, eyes and teeth.

b.  The person who/that was eaten by the wolf was grandma.

However, various pieces of evidence suggest that the examples in (8) are not the results of bits of (9) being deleted. For example, there are acceptable reduced relatives with no acceptable ‘full’ counterpart. Therefore, reduced relatives are not literally reductions of full relatives.

(10)  a.  The creature resembling grandma is a wolf.

b. *The creature which/that is resembling grandma is a wolf.

Reduced relatives in English are formed using the participle forms of the verb: either the present participle, e.g. eating in (8a), or the passive participle, e.g. eaten in (8b). Even though the passive participle looks like the past participle in English, the evidence tells us that reduced relatives can be formed using the passive participle, not the past participle.

(11)  a.  The wolf has eaten grandma.

b. *The wolf eaten grandma is in bed.

In (11a), eaten is a past participle (not a passive participle). If reduced relatives were formed using the past participle and if the noun modified by a reduced relative can only correspond to the subject of the relative clause, we would expect (11b) to be acceptable. However, it isn’t. This, among other things, tells us that it is the passive participle that is used to form this type of reduced relative.

There is a lot more to say, and we haven’t even mentioned all the types of relative clause that English has to offer! But that must wait for later. If I say anymore at present, I fear you might start to envy grandma.

(No grandmas were harmed in the writing of this blogpost… well, one was eaten, but the rest are fine)

Domains within sentences

One idea that has emerged in modern linguistics is that sentences can be divided into different parts or “domains“, each with its own separate function.

At the core of the sentence are the verb and its “participants” – the nouns or pronouns associated with it. This is called the classification domain, where basic properties of the sentence are classified. However, there’s nothing here about information like when the event took place, or even if it happened at all. You might like to think the content of this domain as something more like the logical representation love(Lucy, Chris), meaning (roughly) “an event of loving with participants Lucy and Chris” – the representation says nothing about whether the event takes place in the past, the present, the future or not at all.


Next   we get the anchoring domain where the event is anchored in the world in some way. In English this domain precedes the classification domain and involves things like tense (often shown by auxiliaries like did) and negation (e.g. n’t, not). Subjects also move to occupy a position in this domain.

 Lucy didn’t love Chris

Some languages don’t use time/tense to anchor their sentences in the world, but other things, like location. For example, the Yagua language (spoken in Peru) has a suffix mu which shows that an event look place downriver relative to the place of speaking. So naadarããyããmuyada means “they two danced around downriver”. Interestingly, all languages seem to require anchoring of some sort.

The next domain, which precedes the other two in English, is called the linking domain. This can, for example, contain elements which link the clause to other clauses: e.g. words like that which mark a clause as a subordinate clause (embedded within a larger sentence):

(I believe) that Lucy didn’t love Chris

Question elements like why or what which link the sentence to the wider discourse also come in this domain, and hence occur toward the start of the sentence in English.

I think it’s very interesting that sentences can be divided up into different domains in this way and believe it has the potential to tell us a great deal about how the human mind works.

Adieu to French? That’s no fait accompli.

Jeremy Paxman’s opinion piece in the FT last week, in which he called the French language ‘useless’, has, unsurprisingly, caused something of a furore.

Articles elsewhere covering Paxman’s piece picked up on such quotable lines as:
‘It is time to realise that in many parts of the world, being expected to learn French is positively bad for you’.
‘The outcome of the struggle is clear: English is the language of science, technology, travel, entertainment and sport. To be a citizen of the world it is the one language that you must have.’

Now, when you come to look at the whole piece (unfortunately accessible only via FT subscription),1 most of it focusses on language policy in La Francophonie, the group of countries in which French is spoken, a hangover of its colonial past. Now that’s mostly too political, historical and economic for a linguist in training like me to get my teeth into. But I would like to comment on a couple of parts of Paxman’s argument, to give the view from linguistics.

A temptation to be avoided?

A temptation to be avoided?

Firstly, one of the reasons he gives for stopping the teaching of French in Francophone countries is the dominance – coup de grâce, even – of English. That’s the only useful language to know, being “the language of science, technology, travel, entertainment and sport.” But wait, not so fast, Mr Paxman. Granted, English has more first language and second language speakers than French (300-400 million first language and up to 1bn second language, compared to 80 million and 220 million), and granted that it is widely used, especially in academia. But I wonder whether it would come as a surprise to know that less than half of internet content is in English, or that around 6bn people, over 80% of the world’s population do not speak English at all? If cross-cultural communication is something we care about, then English is not the only language worth knowing.

Secondly, the application for us Brits seems to be ‘don’t bother with French’. Well, okay, Paxman does make it a bit more nuanced than that: “If you are a native English speaker, by all means learn Chinese or Arabic or Spanish. If you must, study French, because it is a beautiful language. But let us have no truck with suggestions that it is much worth learning as a medium of communication.” Thankfully, he’s not advocating not learning any foreign languages (although you might think the paean to the usefulness of English strongly implies it), and, personally, I might well agree that, given a choice, it would be good to have more people learning Mandarin, Urdu, Farsi or Russian (to take a handful currently required by GCHQ), rather than French. But, given the intertwined sociolinguistic history we share with our neighbours, learning French can be a fascinating way into language learning. Learning one foreign language equips you with metalinguistic knowledge and cognitive strategies that help you when learning another, so as long as French remains the only option, sadly, for some at school in UK, it should not be discouraged. Not to mention the many uses it does still have in business and diplomacy (to take one example, the UK is one of the top 6 foreign investors in Morocco, where French is the business language).

The really unfortunate thing about Paxman’s opinion piece – and of course, he is entitled to an opinion – is that it’s full of pithy pull-outable lines that have the potential to cause much more damage out of context, the worst offender being: ‘the real problem with French is that it is a useless language.’ If you’re calling any language useless, you have to ask ‘for what?’ and ‘for whom?’ It may be that some languages are politically or culturally more strategic to learn for different people at different times, but no language, while still alive, is ever useless – for its speakers, however few or many, it is their means of communication, and therefore incredibly useful.

1. COMMENT Voilà – a winner in the battle of global tongues; Opinion
By Jeremy Paxman, 8 April 2016, Financial Times

Topicalization in Chinese: A game of efficiency and compromise

I was sitting in a Chinese restaurant with my friend Lucas in Paris.

Wǒ zhèzhǒng shūcài zuì xǐhuān le (I, this kind of vegetable, like the best)!” I couldn’t help yelling out my excitement on seeing the appetizing hot-pot vegetables on the table.

Wait! What did I say? A soft but firm voice came up in my head. Yes, I had just uttered a somehow “weird” sentence in Mandarin Chinese (my native language). It’s weird because Mandarin is assumed to be an SVO language. That is, the object usually comes after the verb. For example, “I like Julio” in Mandarin is wǒ xǐhuān Julio (I-like-Julio) instead of *wǒ Julio xǐhuān (I-Julio-like) (star=ungrammaticality).

However, after pondering for a while, I decided to accept my weird utterance, because I realized this was one of those “ineffable” situations. There was simply no way to express my excitement and obey the grammatical rules at the same time! Actually (1) is the standard way of saying “I like this kind of vegetable the best” in Mandarin, but I can hardly think of any scenario where I would really use it without sounding too textbook-ish.

(1) Wǒ     zuì      xǐhuān  zhèzhǒng    shūcài          le!

I              most   like       this-kind    vegetable    LE

“I like this kind of vegetable the best!”

So what happened to make poor me utter weird things in a Paris restaurant? Was it because I was too hungry? Not necessarily. Similar sentences are produced by Chinese speakers all the time! For example (ASP=Aspect marker, SFP=Sentence final particle),

(2)   a.  tóu xǐ le ma? (you-head-wash-ASP-SFP; have you washed your hair?)

 b.  Wǒ zuòyè xiěwuán le. (I-homework-finish-ASP; I have finished my homework.)

 c.  Xiǎohóng qiánbāo diū le. (Xiaohong-wallet-lose-ASP; Xiaohong lost her wallet.)

Actually, linguists wouldn’t find such sentences too weird, because what’s involved here is not really SOV ordering, but rather topicalization (=making something the topic of a sentence). So far so good. But what struck me on that hot-pot day was how frequently we actually use topicalization in our daily life. The answer is “a looooooot”! Nowadays it has become a language routine rather than a stylistic alternation.

Yet you may wonder: what’s the price of topicalization? Well, good question! The price is that sometimes we get confused by ourselves! For example, since elementary school I’ve been wondering how to say the following sentence in a nicer way:

(3) Wǒmā      xiǎoshíhòu    zǒngshì      dǎ     wǒ.

my-mom       as a child    always       beat  me

“When I/my mom was a child, my mom/she always beat me.”

My mom always beat me when I was a child! (by Bokai Huang)

My mom always beat me when I was a child! (by Bokai Huang)[1]

Fortunately, this isn’t true for me, but after so many years I still don’t understand why my compatriots would produce sentences like this. Maybe this is another compromise between expressiveness and grammaticality (in a loose sense), as all the clearer and unambiguous versions of (3) simply sound equally bad (if not worse), as in (4).

(4)   a.  ?Xiǎoshíhòu wǒmā zǒngshì dǎ wǒ. (as a child-my mom-always-beat-me)

b.  ?Wǒ xiǎoshíhòu wǒmā zǒngshì dǎ wǒ. (I-as a child-my mom-always-beat-me)

So why is it so difficult to say what we mean? Because (3) not only involves topicalization (this time it is the subject that gets topicalized), but also has an embedded null subject (Oops, Chinese is one of those radical pro-drop languages, where things like subject and object can be happily and wildly omitted). The problematic chunk xiǎoshíhòu is actually part of a phrase (dāng/zài) XX xiǎoshíhòu “(when) XX was a child”, e.g.

(5) a. Míngyuè xiǎoshíhòu xǐhuān hē chá.

“When Mingyue was a child, she liked drinking tea.”

 b. Qíng’er xiǎoshíhòu yǎng guò yìzhī xiǎogǒu.

“When Qing’er was a child, she had a pet puppy.”

 c. Liánlian xiǎoshíhòu zǒngshì kǎo yìbǎi fēn.

“When Lianlian was a child, she always got 100 marks.”

So, (3) can have either (6a) or (6b) as its underlying structure (parentheses=being dropped or deleted).

(6)   a.  [Topic wǒmā   [A (zài wǒ) xiǎoshíhòu    [B zǒngshì    [vP (wǒmā)   dǎ wǒ ]]]]

my mom                          when I  as a child       always         my mom     beat me

“My mom, when I was a child, always beat me.”

b.  [A wǒmā      xiǎoshíhòu [B zǒngshì  [vP (wǒmā)        dǎ wǒ ]]]

my mom            as a child        always       my mom      beat me

“When my mom was a child, she always beat me.”

(Technical details like displacement are omitted. Simply treat A/B as two chunks adjoined to the verbal core vP “(my mom) beat me”, which assumes the basic word order SVO.)

Of course, (6b) is against most people’s real-world knowledge, because when “my mom” was a child, “I” probably didn’t exist at all! But we’re living in a curious world, and one of the most fascinating characteristics of natural languages is precisely their capacity of expressing even the least possible things. Therefore, although (6b) is pragmatically marked, it’s grammatically well-formed.

So, with all the imperfections of topicalization (as in my sudden enlightenment in the hot-pot restaurant and the imaginary world where poor kids are abused by their child-moms), why do we still love it so much?

Well, like I said, it’s a compromise between expressiveness and grammaticality (still in a loose sense). In real-life communication, language first and foremost serves to express meanings and emotions. So, who wins in such a game of efficiency and compromise? Mostly expressiveness, especially in colloquial language. This is also one of the biggest differences between colloquial language and “ideal” (or less ideally, written/textbook) language. For example, in the latter register, (3) may well be yielded in a much nicer way as (7).

(7)   Zài   wǒ   xiǎodeshíhòu,  māma  zǒngshì  bùfēn qīnghóngzàobái de   dǎ      wǒ.

when       I      as a child         mom    always   indiscriminately                beat   me

“When I was a child, my mom always beat me without clear reasons.”

(7) is not only nicer from a grammatical perspective, but also more natural on a narrative level. However, real life isn’t story-telling, and speaking like (7) all the time can be hard work (probably not for literature lovers). Hearing such sentences constantly can also be exhausting— they’re simply not proper for the colloquial register.

Abstracting away from the register issue, a more technical problem facing linguists is what forms part of the ideal language (I-language) and what reflects real-life compromises. The former aspects are significant in revealing the essence of our language instinct, while the latter aren’t of as much evidence to this end. In linguistics jargon, this is a question of Competence vs. Performance. But as we have seen, the boundary between the two is often blurred in the data we have access to (it’s a pity we can’t directly see through speakers’ minds).

Last but not least, the vegetable I was excited about in the hot-pot restaurant was “needle mushroom” (jīnzhēngū)! It’s the best, especially with beef!

Needle mushroom with beef (Jīn Zhēn Gū Féi Niú) (source: http://261925957.blog.sohu.com/307124194.html)

Needle mushroom with beef (Jīn Zhēn Gū Féi Niú)[2]

Picture sources:

[1] http://weibo.com/huangzhigaojian?from=profile&wvr=6

[2] http://261925957.blog.sohu.com/307124194.html