Págico - Português Mágico

Evaluating Wikipedia-based information retrieval in Portuguese

Linguateca

Em português


What is Págico?

Págico is an evaluation contest in information retrieval in Portuguese whose goal is to evaluate systems that find non-trivial answers to complex information needs in Portuguese, and is a follow-up of GikiCLEF that builds on our previous experience but focuses on a specific cultural sphere (the Portuguese-speaking one) instead of cross-linguality or geographical subjects.

Our intention is to leave the prototype level and engage in tasks with undeniable practical and cultural interest: to answer in an automatic way to needs that would require browsing hundreds or even thousands of pages in an encyclopedia, and which are therefore hard if not impossible to be done by a human being.

Although we are well aware of the biases and weaknesses of Wikipedia as a whole and of the Portuguese Wikipedia in particular, we believe that this initiative may even be able to contribute to its improvement or at least to a critical identification of its strengths and weaknesses.

See here more detailed page about the motivation for Págico, as well as the workshop (in Portuguese) and Cartola, the freely available resulting results package.

If you are already registered in Págico, please sign in.

Venue and calendar

The final meeting in Págico was an associated workshop in Propor 2012, which took place in Coimbra 17 April 2012.

Contrary to previous evaluation contests, this year we accepted human participation as well. In other words, people fought against automated systems and produced results for evaluation.

In that case, we accepted registrations also during the evaluation, until November 30. The registration for systems was open until 30 July 2011, when participants were given detailed instructions and examples of topics and answers.

The evaluation contest proper started November 4th, and took place until November 30th for human participants, and until November 11th for systems. The results were delivered just after the end of the year, so that the participants could write about their systems and approaches before the workshop. An edited collection, in Portuguese, was published afterwards as a special issue of special issue of the Linguamática journal:

Diana Santos, Cristina Mota, Cláudia Freitas & Luís Costa (eds.) Linguamática 4 (1). Abril, 2012, pdf.

Task description

The goal is to obtain the wikipedia pages that answer a given information need formulated through a topic. Examples of topics and (a subset of the) answers:

Additionally, participants need to provide the wikipedia pages that support that the chosen answer is inded the correct one. For instance, it will be necessary to identify the page http://pt.wikipedia.org/wiki/Pedro_Nunes_(matemático) as the justification that the nonio is a breaktrough related to the Jesuitic school of Coimbra.

The pages provided as answers and justifications will be selected from a static version of the Wikipedia created by Linguateca for Págico.

For more information (in Portuguese) about both the human participation and the system participation, please consult the FAQ about Págico.

Organizers

Págico is organized by Linguateca, given its mission to promote evaluation in the area of the computational processing of Portuguese, together with the University of Oslo (UiO), the Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio) and the Universidade de Coimbra (UC). The main organizers within Linguateca are Cristina Mota, Alberto Simões, Cláudia Freitas, Luís Costa and Diana Santos.

The original underlying system, SIGA, was developed by Luís Miguel Cabral for GikiCLEF, although several improvements have been added to the current version.

Important dates

References

The following papers give an overview of Págico and SIGA: Costa et al. (2012) and Mota et al. (2012).

See also some papers on GikiP and GikiCLEF, Págico's forerunners: Santos et al. (2009), Santos et al. (2010) and Santos & Cabral (2010).

To get all GikiCLEF publications, it is enough to invoke Linguateca's publication catalogue asking for publications with the tags GikiCLEF or Págico.

Funding

Linguateca and Págico were funded until 31 December 2011 by the following funding agencies:

UMIC - Agência para a Sociedade do Conhecimento FCCN - Fundação para a Computação Científica Nacional

MCTES FCT - Fundação para a Ciência e a Tecnologia

and continued to be supported by the following institutions after that date:

UiO PUC-Rio UC


Last update: 29 april 2012.

Inquiries about Págico