Workshop on Language Resources for Teaching and Research

Quarta-feira, 23 de Abril de 2008

Faculdade de Letras da Universidade do Porto

10.00-11.00	Max Silberztein - "NooJ's Text Annotation Structure"
	NooJ associates each text with a Text Annotation Structure, in which each recognized linguistic unit is represented by an annotation. Annotations store the position of the text units to be represented, their length, and linguistic information. NooJ can represent and process complex annotations, such as those that represent units inside word forms, as well as those that are discontinuous. We demonstrate how to use NooJ's morphological, lexical, and syntactic tools to formalize and process these complex annotations.
11.00-11.30	Coffee Break
11.30-12.00	Anabela Barreiro - "Port4NooJ: Portuguese Linguistic Module and Bilingual Resources for Machine Translation"
	This presentation will focus on Port4NooJ, the open source NooJ Portuguese linguistic module, which integrates a bilingual extension for Portuguese-English machine translation, work in progress. It describes the main components of the module, particularly, the electronic dictionaries, the rules which formalize and document Portuguese inflectional and derivational descriptions, and the different types of grammar: morphological, disambiguation, syntactic-semantic, multiword expressions and translation grammars. It explains how the different components interact and shows the application of the linguistic resources to text.
12.00-12.30	Sérgio Matos, Anabela Barreiro, Belinda Maia - "Corpógrafo and NooJ: using linguistic resources to obtain aligned concordances from corpora"
	In this presentation we will describe the integration of NooJ within the Corpógrafo environment. We will demonstrate how the NooJ corpus processing engine and linguistic resources are used in Corpógrafo for extracting lexical bundles and for obtaining simple concordances from corpora and aligned concordances from parallel corpora, and how we are implementing the search for "parallel" concordances in comparable corpora.
12.30-14.30	Working lunch
14:30-15.00	Belinda Maia - "Applications of Corpógrafo and NooJ to Teaching and Research"
	This presentation will focus on the applications of Corpógrafo and NooJ to teaching and research. We shall begin by explaining the earlier word and n-gram based possibilities of the Corpógrafo, such as the identification of term candidates, multiword expressions, lexical bundles and the identification of discourse markers for research in discourse analysis. We shall then discuss the need for cooperation and feedback between such methods and those based on PoS and syntactic analysis. Presentations of supporting or alternative points of view and discussion
15.00-15.20	Alberto Simões - "Examples Extraction for Machine Translation"
	This presentation will focus on some techniques for the extraction of bilingual resources for machine translation, giving some emphasis to the extraction of translation examples. It will include a brief experiment on the usage of these resources for hybrid machine translation.
15.20-15.40	Claudia Freitas - "Exploring Portuguese Syntax with Floresta Sintáctica"
	This presentation will describe Floresta Sintactica, a syntactic Treebank for Portuguese. Some new linguistic features will be presented, as well as examples of how Floresta can be used to explore aspects of Portuguese syntax. The new interface of Floresta, Milhafre, work in progress, will also be shown.
15.40-16.00	Hugo Oliveira - "PAPEL: A lexical ontology for Portuguese"
	PAPEL is a lexical resource for natural language processing (NLP) of Portuguese which is being built by Linguateca, based on processing a major commercial Portuguese dictionary, the Dicionário da Língua Portuguesa (DLP) developed and owned by the largest Portuguese dictionary publisher, Porto Editora. As far as we know, PAPEL is the first lexical ontology built by semi-automatic means for Portuguese. We are currently working on CAUSADOR-DE/RESULTADO-DE, TODO-DE/PARTE-DE and MEIO-PARA/FINALIDADE-DE semantic relations, that can be extracted using the PEN parser and specific grammars consisting of string patterns.
16.00-16.30	Coffee break
16.30-16.50	Susana Inácio "Colouring COMPARA: contrastive and monolingual colour studies in English and Portuguese"
	We will describe the English and Portuguese colour studies carried out using COMPARA (www.linguateca.pt/COMPARA/) as a result of the semantic annotation process of the corpus regarding colour. The aim of these studies is to analyse the use of colour by English- and Portuguese-speaking authors by quantifying data, identifying patterns and tendencies -- including colour variation analysis throughout time -- and contrasting findings. Taking advantage of the fact that COMPARA is syntactically analysed (automatically by the PALAVRAS parser and then manually revised and documented for the Portuguese part, and automatically by CLAWS for the English part), this paper will include a colour-related morphosyntactic analysis, of both English and Portuguese.
16.50-17.10	José Paulo Tavares - "NooJ Latin module"
	Electronic resources designed for the automatic treatment of Latin will be presented in this communication. These resources can be used for teaching and/or research. We will also present a brief description of the NooJ Latin module.
17.10-18.00	Summing up and discussion of future projects

Última actualização: 15 de Maio de 2008
Perguntas, comentários e sugestões