Diana Santos

I have worked in natural language processing and language engineering since 1987.

My main achievement was launching a distributed resource center for the the processing of the Portuguese language, Linguateca (2000-), as a follow-up of the Computational Processing of Portuguese project (1998-2000).

In this connection, I have been involved in the organization of several evaluation contests for Portuguese (Morfolimpíadas, HAREM, GikiCLEF, Págico) as well as in adding Portuguese to CLEF, the main international forum for crosslingual information retrieval.

Also in the scope of Linguateca, I have deployed or helped deploy several important corpus resources for Portuguese, such as AC/DC, COMPARA, the Floresta Sintá(c)tica treebank, and CorTrad. I have also supervised the creation of CETEMPúblico, CETENFolha and the CHAVE and GIRA collections for IR evaluation, as well as the development of the Esfinge QA system.

To know something about my past, see a short CV or my more than 330 publications.

Scientific interests

My main scientific interests are: I have worked in corpus processing and analysis, morphological analysis, parsing, tense and aspect modelling, contrastive studies, machine translation, alignment, and Web interfaces to corpora. I am currently also interested in question answering, information retrieval, unobstrusive usability studies and the future of the Web: Web 2.0 and Web services.

See also my interests page, written in 1997 and slightly updated in 2003.

(Science) political views

I do not believe in anonymous reviews. See why.

I believe (for NLP) in language communities (all those who speak Portuguese); not geographical communities (such as Iberian, European or Latin-American).

I believe that people should teach and learn in their own native language, and that scientific publishing in English only is fundamentally wrong (for non-native English speakers). Scientists have the duty to translate and mediate science in their own language, instead of betraying it.

Last modified: 22 May 2011