Abstract
Santos 1999e
Diana Santos. "Comparação de corpora em português: alguns comentários", http://www.linguateca.pt/Diana/download/CCP.ps.
Translation of the title: Corpus comparison in Portuguese: some comments
In this paper I make some preliminary comments to corpus comparison in Portuguese,
dealing with:
- ortographic properties (capitalization, hyphenation, token form, etc.)
- the 100 most frequent words
- the 30 most frequent nouns
- proper noun frequency and structure
- the 15 most frequent one-word proper nouns
- perception verb frequencies
- the frequency of localizers (where and when)
In an introductory section, I report on some problems of encoding the six corpora dealt with, coming from widely different sources and formats. An interesting remark concerns (mis)encoding of footnotes.
Other documents related to Linguateca
Other publications by Diana Santos