[Main Page]

GikiCLEF - Cross-language Geographic Information Retrieval from Wikipedia

(Difference between revisions)



m
m
Line 101: Line 101:
-
<SUPERB>
+
<SUPERB tag="GikiCLEF">

Revision as of 09:40, 12 November 2008

Contents

GikiCLEF 2009: Cross-language Geographic Information Retrieval from Wikipedia

The GikiCLEF 2009 is an evaluation task for the CLEF 2009, succeeding the GikiP 2008 pilot task. The task is being co-organized by (the list is in alphabetical order by last name, and it is not yet complete):

Call for participation


We see GikiCLEF as a joint-evaluation task, where all participants may contribute to improve the task and suit all their needs. We are currently inviting all participants to join the GikiCLEF mailing list through the following form.

GikiCLEF 2009 task description


GikiCLEF intends to evaluate systems on finding Wikipedia entries / documents that answer a particular information need which requires geographical reasoning of some sort.

GikiCLEF participants must build systems that are capable of answering a group of geographically challenging topics, using Wikipedia collections and returning a list of document URIs that contain the correct answers for each topic (this is an open subject, as the topics might require, for example, an ordered list of answers ordered by date, location or a given storyline). Examples of the GikiP 2008 topics include:

  1. Which African capital have more than two million inhabitants?
  2. List places where Goethe lived.
  3. What wars occurred in Greek soil?

The GikiCLEF open issues are the following:

Languages

We will use Bulgarian, Dutch, English, German, Italian, Norwegian, Portuguese, Romanian and Spanish languages for topics and collections for GikiCLEF 2009. Other languages can be suggested by participants; if you want to add another language, please contact the GikiCLEF organizers.

Collection

The organizers will take care of making the Wikipedia static dumps from all GikiCLEF languages available for pre-processing and test the systems with the new collections. We intend to release the collections until the end of 2008. The Wikipedia snapshots will be converted to XML with the WikiXML tool.

Topics

There will be 50 topics for GikiCLEF 2009, spanning several kinds and different cultures, to cover all GikiCLEF collections. The topics will be release in early March 2009, and after the release, the systems will have 2 weeks to return a list of answers.

We encourage participants to inform GikiCLEF organizers of the kind of topics they are particularly interested, so that the final topics may reflect it.

Evaluation

GikiCLEF accepts only answers / documents of the correct type are expected. For example, names of people (painters and scientists), names of countries (not of wars or kings), etc. The system's results in GikiP 2008 were evaluated according to number of correct hits (N) and precision, by the simple formula mult*N*N/total, for each topic, where mult rewards multilinguality. The system's final score was given by the average of the individual scores.

For the GikiCLEF 2009, we need to develop new measures to evaluate the performance of the systems, in a way that it encourages multilinguality and diversity of answers. Any suggestions on this subject are welcome.

Important dates


We aim to an early topic development and release, and also to an early submission deadline compared to the other CLEF tracks, to avoid the 'rush-months' of CLEF tracks. The final dates are still being decided among the organizers.

  1. 9 October 2008 - GikiCLEF mailing list open, call for participation and guideline discussion
  2. 28-30 October 2008 - Promoting GikiCLEF on the GIR workshop held at CIKM 2008, Napa Valley, CA, EUA.
  3. Until the end of 2008 - Wikipedia collections made available to all participants.
  4. November 2008 - February 2009 - Final definition of the GikiCLEF task. Publication of the details of the task.
  5.  March 2009 - Topic release.
  6. (2 weeks after topic release) - Deadline for run submission.
  7.  June 2009 - Assessment and results made available.
  8.  September 2009 - CLEF workshop at Corfu, Greece.

GikiP 2008 pilot task


The GikiP task was accepted as a pilot task for the GeoCLEF 2008 main track. The GikiP organization (including topic development, assessments and evaluation of the results) was made by Linguateca.

Please visit the main page of GikiP 2008 for more information regarding GikiP 2008.

Acknowledgements


GikiCLEF is organized under the scope of CLEF, an activity of the TrebleCLEF Coordination Action. Other related evaluation tasks: QA@CLEF, GeoCLEF.

So far GikiCLEF is being funded by Linguateca, jointly funded by the Portuguese Government and the European Union (FEDER and FSE) under contract ref. POSC/339/1.3/C/NAC.

Other material



[Santos et al. 2008]
Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "Getting geographical answers from Wikipedia: the GikiP pilot at CLEF". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), Cross Language Evaluation Forum: Working Notes for the CLEF 2008 Workshop (Aarhus, Denmark, 17-19 de Setembro de 2008), s/pp. http://www.linguateca.pt/Diana/download/SantosetalWNCLEF2008.pdf
[Santos & Cardoso 2008]
Diana Santos & Nuno Cardoso. "GikiP: Evaluating geographical answers from Wikipedia". In 5th Workshop on Geographic Information Retrieval (GIR'08) (Napa Valley, CA, USA, 30 October 2008), pp. 59-60. http://www.linguateca.pt/Diana/download/SantosCardosoGIR08.pdf Slides
[Santos & Cardoso 2009]
Diana Santos & Nuno Cardoso. "REMando para o futuro: reconhecimento de entidades mencionadas e não só". Escola de Verão Belinda Maia (Edv 2009) (FLUP, Porto, Portugal, 29 de Junho - 3 de Julho 2009). Slides
[Larson 2009]
Ray R. Larson. "Interactive Probabilistic Search for GikiCLEF". GikiCLEF overview session at CLEF workshop (GikiCLEF) (Corfu, Greece, 30 September - 2 October). Slides
[Santos & Cabral 2009]
Diana Santos & Luís Miguel Cabral. "GikiCLEF: Crosscultural issues in an international setting: asking non-English-centered questions to Wikipedia". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), Cross Language Evaluation Forum: Working notes for CLEF 2009 (Corfu, Grécia, 30 Setembro - 2 Outubro), Springer. Slides http://www.linguateca.pt/Diana/download/SantosCabralCLEF2009WN.pdf
[Cardoso 2009]
Nuno Cardoso. "GikiCLEF topics and Wikipedia articles: did it blend?". CLEF2009 (Corfu, Grécia, 30 Setembro - 2 Outubro). Poster
[Hartrumpf & Leveling 2009]
Sven Hartrumpf & Johannes Leveling. "GIRSA-WP at GikiCLEF: Integration of Structured Information and Decomposition of Questions". GikiCLEF overview session at CLEF workshop (GikiCLEF) (Corfu, Greece, 30 September - 2 October). Slides
[Dornescu 2009]
Iustin Dornescu. "EQUAL - Encyclopaedic QA for Lists". GikiCLEF overview session at CLEF workshop (GikiCLEF) (Corfu, Greece, 30 September - 2 October). Slides
[Santos et al. 2009]
Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "GikiP at GeoCLEF 2008: Joining GIR and QA forces for querying Wikipedia". In Carol Peters, Tomas Deselaers, Nicola Ferro, Julio Gonzalo, Gareth J.F.Jones, Mikko Kurimo, Thomas Mandl, Anselmo Peñas & Viviane Petras (eds.), Evaluating Systems for Multilingual and Multimodal Information Access 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers 2009, Springer, pp. 894-905. http://www.linguateca.pt/Diana/download/SantosetalGikiPCLEF2008Springer2009.pdf
[Santos et al. 2010]
Diana Santos, Nuno Cardoso & Luís Miguel Cabral. "How geographic was GikiCLEF? A GIR-critical review". (FCUL, Lisboa, 26 de Janeiro de 2010). Slides
[Santos et al. 2010]
Diana Santos, Nuno Cardoso & Luís Miguel Cabral. "How geographical was GikiCLEF? A GIR-critical review". In 6th Workshop on Geographic Information Retrieval (GIR'10) (Zurique, 18-19 Fevereiro). http://www.linguateca.pt/Diana/download/SantosCardosoCabralGIR2010.pdf
[Santos et al. 2010]
Diana Santos, Luís Miguel Cabral, Pamela Forner, Corina Forascu, Fredric Gey, Katrin Lamm, Thomas Mandl, Petya Osenova, Anselmo Peñas, Alvaro Rodrigo, Julia Schulz, Yvonne Skalban, Erik Tjong Kim Sang & Nuno Cardoso. "GikiCLEF: Crosscultural issues in multilingual information access". Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) (Valletta, Malta, 17-23 May de 2010). Poster
[Santos et al. 2010]
Diana Santos, Luís Miguel Cabral, Corina Forascu, Pamela Forner, Fredric Gey, Katrin Lamm, Thomas Mandl, Petya Osenova, Anselmo Peñas, Alvaro Rodrigo, Julia Schulz, Yvonne Skalban & Erik Tjong Kim Sang. "GikiCLEF: Crosscultural issues in multilingual information access". In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner & Daniel Tapias (eds.), Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) (Valletta, Malta, 17-23 May de 2010), European Language Resources Association, pp. 2346-2353. http://www.linguateca.pt/Diana/download/SantosetalGikiCLEF.pdf
[Santos & Cabral 2010]
Diana Santos & Luís Miguel Cabral. "GikiCLEF : Expectations and lessons learned". In Carol Peters, Giorgio Di Nunzio, Mikko Kurimo, Thomas Mandl, Djamel Mostefa, Anselmo Peñas & Giovanna Roda (eds.), Multilingual Information Access Evaluation, VOL I Setembro de 2010, Springer, pp. 212-222. http://www.linguateca.pt/Diana/download/SantosCabralSpringer2010.pdf
[Cardoso 2010]
Nuno Cardoso. "GikiCLEF topics and Wikipedia articles: Did they blend?". In Carol Peters, Giorgio Di Nunzio, Mikko Kurimo, Thomas Mandl, Djamel Mostefa, Anselmo Peñas & Giovanna Roda (eds.), Multilingual Information Access Evaluation, VOL I Setembro de 2010, Springer.
[Costa et al. 2012]
Luís Costa, Cristina Mota, Diana Santos, Luís Costa, Cristina Mota & Diana Santos. "SIGA, a System to Manage Information Retrieval Evaluations". In Computational processing of the Portuguese language (PROPOR2012) (Coimbra, Abril de 2012), pp. 173-184. http://www.linguateca.pt/Diana/download/CostaetalPROPOR2012.pdf
[Mota et al. 2012]
Cristina Mota, Alberto Simões, Cláudia Freitas, Luís Costa & Diana Santos. "Págico: Evaluating Wikipedia-based information retrieval in Portuguese". In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U?ur Do?an, Bente Maegaard, Joseph Mariani, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Eigth International Conference on Language Resources and Evaluation (LREC'12) (Istambul, 23-25 de Maio de 2012), pp. 2015-2022. pdf poster pdf