File download

Obtenção da Floresta

logo temporário da FS
Projecto Floresta sintá(c)tica

From this page you can download the latest version of the four Floresta Sintá(c)tica components. Shortly about formats:

Amazônia

Dependency format: amaz1.dep, amaz2.dep, amaz3.dep, amaz4.dep, amaz5.dep, amaz6.dep
Phrase structure format: amaz1.ad, amaz2.ad, amaz3.ad, amaz4.ad, amaz5.ad, amaz6.ad
CoNLL format: amazonia.conll.gz
Metadata information: amaz1.txt, amaz2.txt, amaz3.txt, amaz4.txt, amaz5.txt, amaz6.txt,
Last change: 3 July 2010, version 2.2
License: Creative Commons: Attribution-Noncommercial-Share Alike

Selva

Dependency format: selva_cien.dep, selva_fala.dep, selva_lit.dep
Last change: 3 November 2008, version 1.0
Phrase structure format: selva_cien.ad, selva_fala.ad, selva_lit.ad
Last change: 3 November 2008, version 1.0
ConLL format: selva_cien.conll.gz, selva_fala.conll.gz, selva_lit.conll.gz
Metadata information: to be provided.
Revision information: in Portuguese

Floresta Virgem

Dependency format: FV_CP.dep, FV_CF.dep
Last change: 6 October 2008, version 3.0
Phrase structure format: FV_CP.ad, FV_CF.ad
Last change: 6 October 2008, version 3.0
CoNLL format: FlorestaVirgem_CP.conll.gz, FlorestaVirgem_CF.conll.gz

Bosque

Dependency format, CGD*: BosqueCP.cgd, BosqueCF.cgd,
Last change: 14 September 2006, version 7.4
Bosque_CP_7.5_cgde_2203216.gz, Bosque_CF_7.5_cgde_2203216.gz
Last change: 22 March 2016, version 7.5
Phrase structure format: BosqueCP.ad, BosqueCF.ad
Last change to the content: 6 October 2008, version 8.0
CoNNL, converted directly from CGD to CONLL format: Bosque_CP_7.4.conll.gz, Bosque_CF_7.4.conll.gz, Bosque_CP_7.5_cgde_22032016.conll.gz, Bosque_CF_7.5_cgde_22032016.conll.gz
Dependency Bosque 7.5, converted to Universal dependencies, in CoNLL format: bosque_CP.udep.conll.gz, bosque_CF.udep.conll.gz

* Note: Bosque 7.4 CGD format was obtained from the (humanly revised) AD format by a complex automatic process involving several filters, and there may be possible errors and descriptive problems introduced in the process. Although we had planned to update the annotation so that both formats encode the same linguistic information, in fact the two versions branched off, with AD 8.0 and CGDE 7.5 corresponding to different sets of changes since 7.4. We use the extension CDGE to denote this.

We also note that several further formats can be found in the Floresta page of the former Linguateca node in Braga, still maintained by the Natura project.


Last update of this page: 23 March 2016.
Comments and suggestions about the Floresta treebank