I have worked in natural language processing and language engineering since 1987.
My main achievement was launching a distributed resource center for the the processing of the Portuguese language, Linguateca (2000-), as a follow-up of the Computational Processing of Portuguese project (1998-2000).
In this connection, I have been involved in the organization of several evaluation contests for Portuguese (Morfolimpíadas, HAREM, GikiCLEF, Págico) as well as in adding Portuguese to CLEF, the main international forum for crosslingual information retrieval.
Also in the scope of Linguateca, I have deployed or helped deploy several important corpus resources for Portuguese, such as AC/DC, COMPARA, the Floresta Sintá(c)tica treebank, and CorTrad. I have also supervised the creation of CETEMPúblico, CETENFolha and the CHAVE and GIRA collections for IR evaluation, as well as the development of the Esfinge QA system.
To know something about my past, see a short CV or my more than 430 publications.
See also my interests page, written in 1997 and slightly updated in 2003.
I believe (for NLP) in language communities (all those who speak Portuguese); not geographical communities (such as Iberian, European or Latin-American).
I believe that people should teach and learn in their own native language, and that scientific publishing in English only is fundamentally wrong (for non-native English speakers). Scientists have the duty to translate and mediate science in their own language, instead of betraying it.