Construção do COMPARA : Annotation workflow
How to add new texts to (annotated) COMPARA
Diana Santos
This page describes the addition of new texts to COMPARA, which is nowadyas already syntactically annotated in the Portuguese side.
- The texts come in FRA format (sentence aligned in the style of COMPARA) -- see an example of FRA format.
- The programa cria_uas2 is run, creating files with the texts already divided in alignment units, UA format -- see an example of UA format.
- The Portuguese file is annotated by PALAVRAS (Bick 2000), yielding a file in ANOT form, see example.
- Then the program contas_COMPARA_anotado must be modified by introducing the source information (in English and Portuguese) for subsequent creation of new Contents and Conteúdo pages. (The information about dates and language varieties must in addition be manually included in the corresponding hashes in biblioteca_COMPARA.pl.)
- The command cria_compara_anotado is then run, which produces in addition to a new version of the CWB-encoded corpus, new bilingual versions of the corpus contents (Contents/Conteudo), which includes an updated overview of the alignment type in COMPARA (TabelaTipoAlinhamento/TabelaTypeAlignment), and the vass file with an account of the several new sizes of partial corpora, to be manually included in the CGI program.
- Then, also the BuscaAvancada/AdvancedSearch pages have to be updated by adding the new pair(s) identification. The CGI paralelo.pl has to be changed as well in order to include reference to thenew pairs and an updated vass.
- Finally, the program actualiza_data_html is run, to change the date of Last update to the corpus in several files in the COMPARA interface.
A new version of (annotated) COMPARA can then be installed, and the manual checking of the syntactical annotation can begin.