Diana Santos

My main interests in brief.

Machine Translation

I was responsible for the development of PORTUGA (Mentor/P), a broad-coverage MT prototype from English to Portuguese. Its development took place at the IBM-INESC Scientific Group, 1987 - 1989.

Some relevant publications are:
Santos, Diana.
"Lexical gaps and idioms in Machine Translation", Hans Karlgren (ed.), Proceedings of COLING'90 (Helsinki, August 1990), Vol 2, pp.330-5.
Santos, Diana.
"Broad-coverage machine translation", in K. Jensen, G. Heidorn & S. Richardson, Natural Language Processing: The PLNLP Approach, Kluwer Academic Press, 1992.

Computational processing of Portuguese

Unfortunately, Portuguese is well behind the other major languages of the world as far as its computational processing is concerned. My efforts in the field have included:

Some relevant publications are:

Medeiros, José Carlos, Rui Marques & Diana Santos.
"Português Quantitativo", Actas do 1.o Encontro de Processamento de Língua Portuguesa (Escrita e Falada) - EPLP'93, (Lisboa, 25-26 de Fevereiro de 1993), pp.33-8.
Barreiro, Anabela, Maria de Jesus Pereira & Diana Santos.
"Critérios e opções linguísticas no desenvolvimento do Palavroso, um sistema computacional de descrição morfológica do português", Relatório INESC num. RT/54-93, Dezembro de 1993.
Santos, Diana.
"Português Computacional", Actas do Congresso Internacional sobre o Português (Lisboa, 11-15 de Abril de 1994), Vol. 3, pp.167-184.
See also my activity in the project Computational Processing of Portuguese.


I hold the following standpoints regarding semantics: Some relevant publications are:

Santos, Diana.
"On the use of parallel texts in the comparison of languages", Actas do XI Encontro da Associação Portuguesa de Linguística (Lisboa, 2-4 de Outubro de 1995), pp.217-239.
Santos, Diana Maria de Sousa Marques Pinto dos.
"Tense and aspect in English and Portuguese: a contrastive semantical study", Tese de doutoramento, Instituto Superior Técnico, Universidade Técnica de Lisboa, Junho 1996.
Santos, Diana.
"The importance of vagueness in translation: Examples from English to Portuguese", Romansk Forum Nr. 5, Juni 1997, pp.43-69.

Corpus processing

I hold the (widely held) belief that corpora are an excellent method of looking at language; but that they are not a solution in themselves. In other words, methodological questions are one of the most interesting subjects of corpus processing.

Some questions are:

Some relevant publications (not covering all the aspects above, though) are:
Bacelar do Nascimento, Maria Fernanda, Amália Mendes & Diana Santos.
"O corpus e a classificação sintáctica dos verbos", Actas do 1.o Encontro de Processamento de Língua Portuguesa (Escrita e Falada) - EPLP'93, (Lisboa, 25-26 de Fevereiro de 1993).
Santos, Diana.
"Bilingual alignment and tense", Proceedings of the Second Annual Workshop on Very Large Corpora (Kyoto, August 4th, 1994), extended version as INESC Report AR/10-94.
Santos, Diana.
"On grammatical translationese", in Short papers presented at the Tenth Scandinavian Conference on Computational Linguistics (Helsinki, 29-30th May 1995), compiled by Kimmo Koskenniemi, pp.59-66.


My favourite subject since 1999, I've been working hard to bring the "evaluation contest" paradign home to the Portuguese language processing community.

Net-based NLP services

To use the Web to make tools and language resources, minimizing adaptation time for new users and focussing on the fundamental questions of user support.

The service for the Oslo Corpus of Bosnian Texts (OCBT) was created and implemented by me, in the framework of the net-based services provided by the Text laboratory.

A similar, though more ambitious service is the one providing access to Portuguese corpora, the AC/DC project.

Relevant publications are:

Santos 98b
Santos, Diana. "Providing access to language resources through the World Wide Web: the Oslo Corpus of Bosnian Texts". In Antonio Rubio, Natividad Gallardo, Rosa Castro and Antonio Tejada (eds.), Proceedings of The First International Conference on Language Resources and Evaluation (Granada, 28-30 May 1998), Vol. 1, pp.475-481.
Santos 99b
Santos, Diana. "Disponibilização de corpora através da WWW". Actas do I Workshop sobre Linguística Computacional da Associação Portuguesa de Linguística (Lisboa, 25-27 de Maio de 1998), APL, 1999.

Contrastive studies

I see contrastive studies as a method to get at a deeper understanding of both each language and of translation between them.

Basically, I am after methodologies to perform corpus-based contrastive studies. I am also interested in studying other languages' influence on my own.

In addition to my PhD thesis, relevant publications are:

Santos 97b
Santos, Diana. "O tradutês na literatura infantil traduzida em Portugal", Actas do XIII Encontro da Associação Portuguesa de Linguística (Lisboa, 1-3 de Outubro de 1997).
Santos 98c
Santos, Diana. "Perception verbs in English and Portuguese". In Johansson, Stig and Signe Oksefjell (eds.), Corpora and Crosslinguistic Research: Theory, Method, and Case Studies. Amsterdam: Rodopi, pp.319-342.
Santos 99a
Santos, Diana. "The Pluperfect in English and Portuguese: What Translations Patterns Show". In Hilde Hasselgaard & Signe Oksefjell (eds.), Out of Corpora: Studies in Honour of Stig Johansson, Amsterdam: Rodopi, pp.283-299.
Santos 99c
Santos, Diana. "Um olhar computacional sobre a tradução". Terminología y Traducción 2/99.
Santos and Oksefjell forthcoming
Santos, Diana & Signe Oksefjell. "Using a translation corpus to validate independent claims", Languages in Contrast.

Research policy in NLP

Having worked as an NLP group leader for quite a while, I am also interested in the general questions of After having described some problems with the way research of NLP is organized in Portugal I suggested some ways to go in a white paper and made practical suggestions for collaborative work in several documents created in the Computational Processing of Portuguese project, now Linguateca.

Information retrieval

I believe that Web information retrieval is the best field to apply both NLP and evaluation techniques, in a real world real "man in the street" context.

Last modified on 4 April 2003