Introduction to COMPARA

What is COMPARA?

COMPARA is a bidirectional parallel corpus of English and Portuguese. In other words, it is a type of database with original and translated texts in these two languages that have been linked together sentence by sentence.

What is COMPARA for?

COMPARA is a tool to help study human translation and contrast English and Portuguese automatically. For example, when we enter a word in Portuguese in COMPARA we can see how this word has been translated into English in different contexts. See an example.

Who uses COMPARA ?

Linguists and engineers working with natural language processing use COMPARA to develop computational tools for English and Portuguese.

Lexicographers use the corpus to improve bilingual dictionaries.

COMPARA is also being used in descriptive and empirical research in Translation Studies.

Lecturers in translation can use COMPARA to prepare exercises and discuss translation problems with students.

Language teachers have been using COMPARA to make exercises and tests for learners of Portuguese and of English.

Professional and student translators use COMPARA to look up linguistic and functional equivalents between English and Portuguese.

Anyone who works with Portuguese and English can use COMPARA as a kind of bilingual dictionary and grammar.

COMPARA has been receiving over ten thousand look-ups per month from all over the world.

How can COMPARA be used?

To start using COMPARA simply click on simple search or on advanced search. Access to COMPARA is totally free. If you have any questions, check our Search help and Tutorial.

What texts make up COMPARA?

At the moment, COMPARA is made up of 75 text pairs. These texts are published literary source-text and translation extracts from Angola, Brazil, Mozambique, Portugal, South Africa, the United Kingdom and the United States. Only direct English-Portuguese and Portuguese English translations are admitted in the corpus.

The following authors, translators and publishers are represented in COMPARA:


Aluísio Azevedo, Autran Dourado, Camilo Castelo Branco, Chico Buarque, David Lodge, Eça de Queirós, Edgar Allan Poe, Henry James, Ian McEwan, Jô Soares, Joanna Trollope, Jorge de Sena, José Cardoso Pires, José de Alencar, José Eduardo Agualusa, José Saramago, Joseph Conrad, Julian Barnes, Kazuo Ishiguro, Lewis Carrol, Lídia Jorge, Machado de Assis, Manuel Antônio de Almeida, Marcos Rey, Mário de Carvalho, Mary Shelley, Mia Couto, Nadine Gordimer, Oscar Wilde, Patrícia Melo, Paulo Coelho, Richard Zimler, Rubem Fonseca e Sá Carneiro.


Adria Frizzi, Alan Clarke, Alexis Levitin, Alice Clemente, Ana Falcão Bastos, Ana Luísa Faria, Ana Maria Amador, Aníbal Fernandes, Carlos Grifo Babo, Cliff Landers, Cristina Ferreira de Almeida, Cristina Rodriguez, David Brookshaw, David Rosenthal, Eduardo Guerra Carneiro, Elizabeth Lowe, Ellen Watson, Fernanda Pinto Rodrigues, Geraldo Galvão Ferraz, Giovanni Pontiero, Graeme Mac Nicoll, Gregory Rabassa, Helen Caldwell, Helena Cardoso, Isabel Burton, J. Teixeira de Aguilar, Januário Leite, John Byrne, John Gledson, John Parker, John Vetch, José Viera Lima, Lídia Cavalcante-Luther, Lucinda Santos Silva, Luís Lobo, M. F. Gonçalves de Azevedo, Manuel João Gomes, Margaret Jull Costa, Maria Carlota Pracana, Maria do Carmo Figueira, Mário Martins de Carvalho, Mary Fitton, Natália Costa, Nina Videira, Paula Reis, Peter Bush, Richard Zenith, Ronald W. Sousa e Yolanda Artiaga.


Associated University Presses, Ática Editorial, Carcanet Press, Dedalus, Edições 70, Edições Asa, Editora Difusão Cultural, Editora Siciliano, Editora Scipione, Editora Vega, Editorial Caminho, Editorial Estampa, Editorial Teorema, Farleigh Dickinson University, Gradiva Publicações, Gávea Brown Publications, J.M. Dent, Louisiana State University Press, Oxford University Press, Picador, Publicações Dom Quixote, Quetzal Editores, Trafika, University of California Press, University of Minnesota Press.

For more information, check the Bibliographic references.

Can the texts in COMPARA be copied or read?

Many of the works represented in COMPARA are contemporary and are protected by copyright law. For this reason, we only store up to 30% of a work in the corpus and, in each look-up, users can only access up to around 30% of these 30% (i.e., about 10% of a complete work). The results are presented to users in the form of isolated and random sentences and this is all they are able to copy. Every sentence is linked to its bibliographic reference so that users can cite them correctly or find out from what publisher they can purchase the full work in question.

How big is the corpus?

COMPARA is currently the largest post-edited Portuguese-English parallel corpus in the world, totalling around three million words. We hope to continue expanding the corpus with more texts and more genres. Regularly updated information on the corpus size is available in Quantitative summary.

Project funding

COMPARA is a non-commercial, academic research project. It is hosted by the Fundação para a Computação Científica Nacional (FCCN) Linguateca project and current funding comes from contract ref. POSC/339/1.3/C/NAC POSC 339/1.3/C/NAC. COMPARA was initially sponsored by the Fundação para a Ciência e Tecnologia (Portugal), the Instituto Superior de Línguas e Administração (ISLA) in Lisbon, Oxford University Language Centre, and FCCN through POSI/PLP/43931/2001.