Águia, a tool for searching the Floresta treebank

logo temporário da FS
Floresta sintá(c)tica project

Interface em português


Search in

Bosque, a subset of Floresta, fully revised by the linguistic team (version 7.5, 12 December 2007): 9,431 trees, corresponding to 1962 extracts of CETEMPúblico and CETENFolha, 9,368 distinct sentences, 215,003 tokens and ca. 184,773 words
Floresta virgem, unrevised Floresta (version 2.1, 16 March 2005). 78,246 trees automatically created from the CG output of the PALAVRAS parser, corresponding to the first million words of the CETEMPúblico and CETENFolha corpora each. NB. Floresta Virgem includes the contents of Bosque without manual revision .

Kind of result

Concordance
Lemma distribution
(Word's) function distribution
Part of speech distribution
Phrase distribution
Phrase distribution of immediate constituents
(Phrase's) function distribution
Function distribution of immediate constituents
Text distribution
Size distribution

Look for:

Help

We are still experimenting with the user interface, and warmly encourage user feedback. We are also developing a guided tour to make the tool more user understandable.

Use the tables below for an idea of the kinds of search criteria already avaliable. Note that for the moment the functions whose names start with /ass require exactly one space in the end of their regular expression argument. We hope to be able to improve usability of this interface soon.

Concordance request

Distribution request

When you ask for distribution, you are actually searching in another corpus whose terminals are phrases. Your search expressions should thus look for things like "np" or [funcao="ACC"], while the kind of results is specified in the kind of distribution you selected.

For example, you can look for what kinds of phrasal subjects (in terms of their constituents) there is in the treebank, by selecting "phrase distribution" and input [funcao="SUBJ"] in the query window. You would get the size distribution if you had chosen size instead.

You may, on the other hand, look at what are the functions of PPs in the corpus, by selecting the function distribution, and simply input "pp" in the query window. Or you may simply look at the actual words in the PP's, for which you would choose text distribution.

To be added


Detailed quantitative data on current Bosque

orações22.029
finitas15.572
infinitivas5.693
averbais764
sintagmas nominais59.878
sintagmas preposicionais32.753
sintagmas adjectivais9.447
sintagmas adverbiais975
itens coordenados5.507
9.437
frases com duas ou mais árvores66
clauses22,029
finite15,572
non-finite5,693
averbal764
noun phrases59,878
prepositional phrases32,753
adjectival phrases9,447
adverbial phrases975
conjuncts5,507
trees9,437
sentences with more than one tree66


Last update: 4 January 2008.
Comments and suggestions about the Floresta treebank