GSCP 2014 abstract

Comparing oral (transcribed) and written corpora in Portuguese

Diana Santos
After shortly presenting AC/DC ,, a large repository and service for many and diverse corpora of Portuguese (Santos, 2011), I will use it to illustrate possible research on orality and possible orality markers.

After Biber's (1988) work on features of writing done for English but then also expanded to other languages, the time is ripe to do similar explorations for Portuguese, like those included e.g. in Biber et al. (1999) or Biber & Gray (2010), but not necessarily with the same features or even methodology. After all, I am aware that written conventions of Portuguese are rather different from those of English as beautifully pointed out by Bennett(2010).

In the presentation, I will, using quantitative data, look at three issues:

  1. vocative and second person use (extending Freitas & Santos 2014),
  2. lexical bundles,
  3. passive, extending Santos (2014).
This will serve as an apetizer for discussing a corpus-based grammar of Portuguese that is in the making, Gramateca, and which has as special feature the use of some semantic annotation as well. See Santos (2014) for methodological issues, and Maia & Santos (2012) for a preliminary example for the fear domain.

The corpora I will be making use can be described at a glance by the following maps:

Oral corpora

Political speeches Soccer commentaries TV debates Parliament discussions Informal speech Interviews Plays

Written corpora

Political newspaper Local newspaper Global newspaper Book reviews by students Thematic newspaper Thematic mailinglist Blogs Magazines/journals Cookbook Web pages (Mail) spam Encyclopedia Unedited local newspaper Legal text Literary works Letters to the editor Translations Essay Academic writing Technical


Last modified: 13 February 2014