Abstract

Rocha & Santos 2000

Paulo Alexandre Rocha & Diana Santos. "CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa", in Maria das Graças Volpe Nunes (ed.), Actas do V Encontro para o processamento computacional da língua portuguesa escrita e falada (PROPOR'2000) (Atibaia, São Paulo, Brasil, 19 a 22 de Novembro de 2000), pp. 131-140.

Translation of the title: CETEMPúblico: A large corpus of Portuguese newspaper text


This paper reports on the creation of CETEMPúblico, the largest publicly available corpus of Portuguese to date, containing 180 million words, created to boost research in language engineering in Portuguese. After providing some background for creating it, we focus on the processing required, explaining in detail some options taken, namely:
Other documents related to the project Computational processing of Portuguese
Other publications by Diana Santos
Other publications by Paulo Alexandre Rocha