Santos 99b

Santos, Diana. "Disponibilização de corpora através da WWW". In Palmira Marrafa & Maria Antónia Mota (eds.), Linguística Computacional: Investigação Fundamental e Aplicações. Actas do I Workshop sobre Linguística Computacional da Associação Portuguesa de Linguística (Lisboa, 25-27 de Maio de 1998), Lisboa: Colibri, 1999, pp.323-346.

Translation of the title: Making corpora available through WWWW

In this paper I present the advantages of making corpora available on the World Wide Web, proposing that a resource network of Portuguese corpora is established.

First, I discuss the advantages of having and making text corpora available, both for corpus providers and for corpus users and, more generally, for anyone interested in studying language and/or working with natural language processing.

I then proceed to suggest some reasons why so far there are so few corpora of Portuguese generally available, describing legal and economic problems, technical difficulties, and cultural impedments.

With reference to my previous work with the Oslo Corpus of Bosnian Texts as an example, I list the advantages of making corpora available, from three perspectives: general properties of WWW-based systems, specific advantages for corpus compilers, and for corpus users.

Finally, I suggest that Evaluation Contests for Portuguese, based on available corpus data, are organized.

Other documents related to the project Computational processing of Portuguese
Other publications by Diana Santos