Advanced Search Help
Help topics
- 1. Select search direction
- 2. Enter query
- 2.1 Word or expression query
- 2.2 Further query options
- 3. Use only a specific part of the corpus (optional)
- 3.1 Choose specific varieties of Portuguese and English
- 3.2 Select dates of publication
- 3.3 Distinguish between source texts and translations
- 3.4 Search only within specific texts
- 3.5 Search within specific authors
- 4. Choose other types of output
1. Select search direction
Leave the "From Portuguese to English" option checked if you wish to enter a word or expression in Portuguese and see its English equivalent. If you wish to reverse the search direction, check the "From English to Portuguese" option.
2. Enter query
2.1 Word or expression query
This is where you type in the word or expression you wish to research. It can be a word, a combination of words, a prefix, a suffix, and many other elements of the language.
As COMPARA contains only literary texts, the queries that work best are the ones that involve general vocabulary, the language of literature and grammatical words (e.g. articles, pronouns, prepositions, conjunctions, etc.). It is not very likely that you will find technical terms in COMPARA because our translation database does not, for now, include technical texts
We also advise you not to search for full sentences. To get good results from corpora, it is better to work with shorter chunks of language, which are more likely to have been used before. In other words, your searches will be more successful if you look up single words and conventional expressions such as as soon as possible than unique utterances such as I enjoy drinking tea before going to bed.
Below are a few examples of queries. Note that you have to surround each word or part of word in your query with double quotation marks.
What you want to search | Example of what to type in | What can be retrieved |
single word | "this" | this |
two or more words | "like" "this" | like this |
two words with any single word in between | "a" ".*" "time" | a long time, a bad time, a short time, a good time, etc. |
two words with zero to three words in between | "give" []* "up" within 3 | give up, give it up, give them up, give it all up, etc. |
word with alternative spellings | "reali[sz]e" | realise, realize |
"colo(u)?r" | colour, color | |
alternative words | "(big|great)" | big, great |
words beginning with "dis" | "dis.*" | dislike, disgusting, disappointing, disappear, disco, district, etc. |
words ending with "ly" | ".*ly" | finally, hopefully, beautifully, sadly, Sally, Billy, holy, etc. |
punctuation marks | "\!" | ! |
"\?" | ? | |
"\," | , | |
double quotation marks, single quotation marks, Portuguese "travessão" | "(\«|\»)" | «,» |
"(\`|\´)" | `, ´ | |
"\--" | -- | |
singular and plural | [lema="boy"] | boy, boys |
verb inflections | [lema="go"] | go, goes, going, gone, went |
verbs before a word or expression | [pos="V.*"] "a" "mistake" | was a mistake, make a mistake, been a mistake, making a mistake |
adjectives before a word | [pos="JJ.*"] "love" | true love, holy love, free love... |
adverbs after a word | "died" [pos="R.*"] | died away, died out, died suddenly... |
nouns after a specific verb | [lema="commit"] [pos="N.*"] | committed suicide, commits adultery, commit injustices.... |
the nominal form of a word that can be a noun and a verb | [word="can" & pos="N.*"] | can (as in a can of beer; excludes the modal verb can) |
For more detailed information on different ways of searching, please refer to the IMS Corpus Workbench syntax.
For more information on queries that involve English grammar, please refer to the CLAWS tagset and COMPARA's English annotation with CLAWS C7: revision criteria.
For more information on queries that involve Portuguese grammar, please refer to the Documentação da anotação morfossintáctica da parte portuguesa do COMPARA.
Alignment constraints (optional)
When searching a word or expression in one of the languages of the corpus, the alignment constraint option allows you to retrieve only the results that contain or omit whichever words or expressions in the other language of the corpus that you insert in the box provided for alignment constraints. For example, if you insert finally in the left-hand side search box and finalmente in the alignment constraint box, you will limit your results to concordances with finally on the English side and finalmente on the Portuguese side. Alternatively, to retrieve concordances with finally on the English side but without finalmente on the Portuguese side, write finally in the text box on the left and !finalmente (preceded by an exclamation mark) in the alignment constraint box.
Here are a few examples of finally aligned with finalmente:
EBDL2(800): | And to make the break with Cambridge somehow entailed breaking finally with Charles. | E o facto de se separar de Cambridge implicava, de algum modo, separar-se finalmente de Charles. |
EBDL3T1(889): | The girl finally got the door open and he went in. | A rapariga conseguiu finalmente abrir a porta e entrou. |
EURZ1(758): | It seems that courage has finally blessed the priest. | Parece que a coragem visitou finalmente o frade. |
In contrast, here are a few examples of finally not aligned with finalmente:
EBIM3(702): | The bedclothes were finally in place, and Julie came and stood by us at the foot of the bed. | A roupa da cama ficou por fim no seu lugar e Julie veio-se juntar a nós, aos pés da cama. |
EBJB3(395): | «I'm sorry, sir,» he said finally. | «Desculpe, sir», disse por fim. |
PBAA2(775): | Raimundo's mother was finally able to repose. | A mãe de Raimundo conseguiu enfim descansar. |
Note that in both cases there is no underlying word alignment. The system simply detects parallel concordances with finally on the English side and with (or without) finalmente on the Portuguese side. There may be a few concordances with finally on the English side and finalmente on the Portuguese side where one word is not the translation of the other.
Alignment constraints are optional. If you don't want to search both languages at the same time, leave this field empty.
2.2 Further query options
COMPARA allows you to retrieve a number of extra things in addition to, or instead of, words or search expressions:
Translators' notes retrieves notes that were added by the translator.
Titles retrieves both real and fictional titles of books, newspapers, magazines, films, plays, television programmes, songs (etc.) cited in the corpus texts. Note that this option does not give you the titles of the corpus texts themselves; if you wish to find out the titles of the texts that make up the corpus, click on the Bibliographic references of the texts in COMPARA..
Foreign words and expressions retrieves words and expressions in a language different from the main language of the text that have been highlighted (normally in italics) by the author, translator or publisher.
Words and expressions highlighted for emphasis retrieves words and expressions within a sentence that the author or the translator has highlighted for emphatic purposes.
Named entities retrieves proper names used to identify brands, shops, hotels, companies, products, doctrines, etc that have been highlighted in the printed edition.
Because of the way the texts in COMPARA have been aligned, you can also retrieve sentences that have been joined, split, added to, deleted from and reordered in translation. More information on the sentence separation criteria adopted are available here.
3. Use only a specific part of the corpus (optional)
3.1 Choose specific varieties of Portuguese and English
COMPARA admits all varieties of Portuguese and of English. If you wish to search only within a specific set of varieties, this is where you select them. Any combination is possible. For example, you can choose all varieties of Portuguese and only British English, or only Brazilian Portuguese and American English, and so on. However, some combinations do not exist in the corpus. If you select South African English translated into Portuguese from Angola, you will not get any results because this combination is not available. To see exactly which combinations are present in the corpus, see the Distribution of parallel texts per language varieties.
3.2 Select dates of publication
COMPARA admits both contemporary and non-contemporary texts. If you wish to search only within very recent or only within non-contemporary texts, this is where you can select texts published before or after a year of publication of your choice. The selection is based on date of the first edition of the text, even if the corpus extract is based on a later edition. For more information, see the Bibliographic references of the texts in COMPARA.
3.3 Distinguish between source texts and translations
COMPARA separates English from Portuguese, but does not, by default, make a distinction between original and translated Portuguese or original and translated English. If it is important that you distinguish between translational and non-translational language, you should select one of the options below.
Search only from source texts to translations
When your search direction in step one goes from English to Portuguese, this option allows you to retrieve only texts originally written in English aligned with their translations into Portuguese. Conversely, when your search direction goes from Portuguese to English, you will only retrieve texts originally written in Portuguese aligned with their translations into English.
Search only from translations back to source texts
When your search direction in step one goes from English to Portuguese, this option allows you to see how English translations relate back to their Portuguese source texts. Conversely, when your search direction goes from Portuguese to English, this option allows you to retrieve only translated Portuguese texts aligned with their respective English source texts.
3.4 Search only within specific texts
COMPARA allows you to create a sub-corpus of your choice by defining which pairs of texts within the corpus you wish to use. For example, you can select texts by only one particular author or translator. This option automatically overrules all the previous narrowing down options. Thus if, for example, you select a text by the South African author Nadine Gordimer, you will not be able to remove South African English from your subset of COMPARA.
Each pair of texts is represented by a code linked to the Bibliographic references of the texts in COMPARA, where you can obtain their full reference plus information on text length and language variety. Codes beginning with P represent texts originally written in Portuguese. Codes beginning with E mean the source text is in English. The texts that make up COMPARA are normally 30% extracts taken from the beginning, middle or end of a book.
3.5 Search within specific authors
This function helps you to search within the texts by specific authors. You can use it to see how the authors represented in COMPARA use certain words and expressions. If you select Julian Barnes, for example, your search will only apply to the texts by Julian Barnes in the corpus.
4. Choose other types of output
By default, the results you get are presented in the form of parallel concordances. Use the options in this section to choose other types of output.
Concordances
Parallel concordances show the key word or expression you entered in bold on the left-hand side of the screen. On the right-hand side you will find its equivalent in the other language.
Don't expect the source text to be always on the left-hand side and the translation always on the right-hand side of your screen. In the results, one column is for Portuguese and the other one for English, rather than one for source texts and the other one for translations, so you may get both source texts and translations on either side of your screen. If you wish to distinguish between source texts and translations, you must specify this on step 3.3 of the search form.
You can check where a concordance comes from by pointing the mouse to the blue code on the left. Click on the code to see the full reference of the pair of texts in question.
Parallel concordances allow you to see your search expression will appear within the context of one alignment unit. An alignment unit in COMPARA is always one full source-text sentence and the corresponding text in the translation, which may not necessarily be a single full sentence (remember that translators don't always translate a text sentence by sentence).
If you tick the show alignment properties option, you will be able to see whether a source-text sentence has been split into two (1:2), deleted (1:0), joined with another sentence (1:1/2), and so on.
If you have selected translators' notes in step 2.2, the hide translators' notes option allows you to see which concordances contain translators' notes without having to see the texts of the notes. By default no notes will be shown if you have not selected them in step 2.2.
Distribution of forms
This option may be useful if your search string allows for a variety of words or word sequences. For example, if you want to see how English adjectives ending in ish have been translated into Portuguese, you may find it useful to see the distribution of forms of all English words in the corpus ending in ish. This will enable you to see how many adjectives like childish and bookish there are, and separate them from words ending in ish which are not adjectives, like goldfish and accomplish.
Distribution of part-of-speech category
This option distinguishes between the part-of-speech categories of ambiguous words which can belong to more than one category. For example, the word can can be a noun or a verb. To find out how many times in the corpus it appears as a noun and how many times it has been classified as a verb, write can in the search box and select a part-of-speech distribution in the results.
Distribution of lemma
This option groups together the different inflections of a word. For example, if you want to find out what the most frequent verb in a given text is without distinguishing between different verb inflections, type [pos="V.*"] in the search box, select the text you want to analyse in step 3.4 of the form and put a tick on the distribution of lemma option. Different verb inflections will be grouped together for counting purposes. Thus the frequency of a lemma such as write will include all occurrences of write, writes, wrote, written and writing.
Distribution of verb tense, case or degree (available only for Portuguese)
This option can be used to distinguish between the tenses, cases or degrees of the lemmas in analysis. For example, if you write [lema="brindar"] in the search box and request this distribution, you will be able to retrieve information about the frequency of the different tenses of the verb brindar.
Distribution of person or number (available only for Portuguese)
This option can be used to distinguish between the person or number of the lemmas in analysis. For example, if you write [lema="chave"] in the search box and request this distribution, you will be able to retrieve information about the frequency of the singular and plural forms of the noun chave.
Distribution of gender (available only for Portuguese)
This option can be used to distinguish between the feminine and masculine forms of the lemmas in analysis. For example, if you write [lema="querido"] in the search box and request this distribution, you will be able to retrieve information about the frequency of the feminine and masculine forms of the lemma querido.
Distribution of sources
This option allows you to see the frequency of your search string in the different texts of the corpus. It might be useful to help you find out how many times a particular author or translator used a given form.
Distribution in original and translated text
This option allows you to see the frequency of your search string in source texts and in translations. It can be used to find out about differences between translational and non-translational language.
Combined distribution of Portuguese and English search expressions
(*please note that in earlier versions of COMPARA the combined distribution
was called quantitative wrap-up)
This option is useful when your query
involves an alignment constraint
For example, you may want to retrieve the word sim in the Portuguese
part of the corpus and see only the concordances where the word yes
appears in the corresponding text in English. The combined distribution tells
you not only how many instances of sim were found, but also how many
instances of sim matched occurences of yes.
More Help?
Do you need more help? Try our Tutorial or, if you can't find what you need there, do get in touch.