[Main Page]

GikiCLEF - Cross-language Geographic Information Retrieval from Wikipedia

(Difference between revisions)



(Further material)
m
Line 59: Line 59:
Answers for each language are scored separately according to N*precision, and the final score of any system is given by the sum of the scores for each individual language.
Answers for each language are scored separately according to N*precision, and the final score of any system is given by the sum of the scores for each individual language.
-
=== Submission format ===
+
==== Submission format ====
Submissions should be encoded in UTF-8, with each line representing an answer given by: i) topic id; ii) space/tab separator; iii) file URI.
Submissions should be encoded in UTF-8, with each line representing an answer given by: i) topic id; ii) space/tab separator; iii) file URI.
For example:
For example:
-
GC34 de/e/x/a/example.xmnl
+
GC34 de/e/x/a/example.xmnl
-
GC34 bg/o/t/h/other_example.xml
+
GC34 bg/o/t/h/other_example.xml
-
GC35 pt/d/i/f/different_topic.xml
+
  GC35 pt/d/i/f/different_topic.xml
GC35 es/s/t/i/still_on_the_second_topic.xml
GC35 es/s/t/i/still_on_the_second_topic.xml
-
 
+
==== Further improvements/tracks ====
-
 
+
-
=== Further improvements/tracks ===
+
As a result of the reflection after GikiP, the following suggestion for the future were made. We request feedback from participants on whether they would also be interested in these issues:
As a result of the reflection after GikiP, the following suggestion for the future were made. We request feedback from participants on whether they would also be interested in these issues:

Revision as of 15:00, 14 November 2008

GikiCLEF 2009 is an evaluation task under the scope of CLEF. Its aim is to evaluate systems which find Wikipedia entries / documents that answer a particular information need, which requires geographical reasoning of some sort.

GikiCLEF is the successor of the GikiP 2008 pilot task which ran in 2008 under GeoCLEF. Template:TOCright

Contents

Call for participation

Prospective participants are requested to join the GikiCLEF mailing list through the following form.


Task description

For GikiCLEF, systems will need to answer or address geographically challenging topics, on the Wikipedia collections, returning Wikipedia document as a list of answers.

For example, in GikiP 2008, topics/questions were:

  1. Which African capitals have a population larger than two million inhabitants?
  2. List places where Goethe lived.
  3. Which wars occurred in Greek soil?

And answers, more precisely answer lists, would be:

  1. Argel, Cairo, Nairobi, Harare, etc.
  2. Germany, Darmstadt, Strasbourg, Frankfurt, etc.
  3. Aetolian War, Cretan War, World War II, etc.

GikiCLEF Languages

So far we have organizers or participants interested in the following languages:

Bulgarian, Dutch, English, German, Italian, Norwegian, Portuguese, Romanian and Spanish.

If you are specifically interested in adding another language, please let us know.

GikiCLEF collections

The Wikipedia collections for all GikiCLEF languages will be made available for pre-processing and testing by the end of 2008. The Wikipedia snapshots for all languages were taken on June 2008.

The Wikipedia snapshots will be converted to XML with the WikiXML tool.

Topics

Fifty (50) topics will be prepared for GikiCLEF. The topic choice committee will strive to devise topics with crosslingual and cultural interest, so that the need for looking in Wikipedia in different languages is real and not artificial.

The topics will be released early March 2009.

Participants are warmly encouraged to tell GikiCLEF organizers about the kind of topics they are particularly interested in, in order for GikiCLEF to reflect the real needs of the community.

Evaluation

Only answers / documents of the correct type are expected (and will therefore be rewarded). After pooling all answers, they will be manually assessed by the organization.

In GikiP 2008, systems were evaluated according to the number of correct hits (N) and precision, by the simple formula mult*N*N/total, for each topic, where mult rewarded multilinguality. The system's final score was given by the average of the individual scores.

We believe that this was too simple an approach, feasible for three languages but in obvious need for improvement. So, for GikiCLEF 2009, we suggest the following:

Answers for each language are scored separately according to N*precision, and the final score of any system is given by the sum of the scores for each individual language.

Submission format

Submissions should be encoded in UTF-8, with each line representing an answer given by: i) topic id; ii) space/tab separator; iii) file URI.

For example:

GC34 de/e/x/a/example.xmnl
GC34 bg/o/t/h/other_example.xml
 GC35 pt/d/i/f/different_topic.xml

GC35 es/s/t/i/still_on_the_second_topic.xml

Further improvements/tracks

As a result of the reflection after GikiP, the following suggestion for the future were made. We request feedback from participants on whether they would also be interested in these issues:


Important dates

We intend to avoid the 'rush-months' of CLEF.

  1. Until the end of 2008 - Wikipedia collections made available to participants.
  2. November 2008 - February 2009 - Discussion on the definition of the GikiCLEF final task, and corresponding publication of participation guildeline.
  3. March 2009 - Topic release and run submission.
  4. (2 weeks after topic release) - Deadline for run submission.
  5. June 2009 - Assessment and GikiCLEF results made available.
  6. September 2009 - CLEF workshop at Corfu, Greece.


Already past

  1. 9 October 2008 - GikiCLEF mailing list open, call for participation and guideline discussion
  2. 30 October 2008 - Presenting GikiCLEF atthe GIR workshop held at CIKM 2008, Napa Valley, CA, EUA

Organization committee

GikiCLEF is being co-organized by (in alphabetical order):


GikiP 2008 pilot task

The GikiP task was accepted as a pilot task for the GeoCLEF 2008 main track, and its organization (including topic development, assessments and evaluation of the results) was made by Linguateca.

Please visit the main page of GikiP 2008 for more information on GikiP 2008.

Further material



[Santos et al. 2008]
Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "Getting geographical answers from Wikipedia: the GikiP pilot at CLEF". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), Cross Language Evaluation Forum: Working Notes for the CLEF 2008 Workshop (Aarhus, Denmark, 17-19 de Setembro de 2008), s/pp. http://www.linguateca.pt/Diana/download/SantosetalWNCLEF2008.pdf
[Santos & Cardoso 2008]
Diana Santos & Nuno Cardoso. "GikiP: Evaluating geographical answers from Wikipedia". In 5th Workshop on Geographic Information Retrieval (GIR'08) (Napa Valley, CA, USA, 30 October 2008), pp. 59-60. http://www.linguateca.pt/Diana/download/SantosCardosoGIR08.pdf Slides
[Santos & Cardoso 2009]
Diana Santos & Nuno Cardoso. "REMando para o futuro: reconhecimento de entidades mencionadas e não só". Escola de Verão Belinda Maia (Edv 2009) (FLUP, Porto, Portugal, 29 de Junho - 3 de Julho 2009). Slides
[Cardoso 2009]
Nuno Cardoso. "GikiCLEF topics and Wikipedia articles: did it blend?". CLEF2009 (Corfu, Grécia, 30 Setembro - 2 Outubro). Poster
[Larson 2009]
Ray R. Larson. "Interactive Probabilistic Search for GikiCLEF". GikiCLEF overview session at CLEF workshop (GikiCLEF) (Corfu, Greece, 30 September - 2 October). Slides
[Hartrumpf & Leveling 2009]
Sven Hartrumpf & Johannes Leveling. "GIRSA-WP at GikiCLEF: Integration of Structured Information and Decomposition of Questions". GikiCLEF overview session at CLEF workshop (GikiCLEF) (Corfu, Greece, 30 September - 2 October). Slides
[Santos & Cabral 2009]
Diana Santos & Luís Miguel Cabral. "GikiCLEF: Crosscultural issues in an international setting: asking non-English-centered questions to Wikipedia". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), Cross Language Evaluation Forum: Working notes for CLEF 2009 (Corfu, Grécia, 30 Setembro - 2 Outubro), Springer. Slides http://www.linguateca.pt/Diana/download/SantosCabralCLEF2009WN.pdf
[Dornescu 2009]
Iustin Dornescu. "EQUAL - Encyclopaedic QA for Lists". GikiCLEF overview session at CLEF workshop (GikiCLEF) (Corfu, Greece, 30 September - 2 October). Slides
[Santos et al. 2009]
Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "GikiP at GeoCLEF 2008: Joining GIR and QA forces for querying Wikipedia". In Carol Peters, Tomas Deselaers, Nicola Ferro, Julio Gonzalo, Gareth J.F.Jones, Mikko Kurimo, Thomas Mandl, Anselmo Peñas & Viviane Petras (eds.), Evaluating Systems for Multilingual and Multimodal Information Access 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers 2009, Springer, pp. 894-905. http://www.linguateca.pt/Diana/download/SantosetalGikiPCLEF2008Springer2009.pdf
[Santos et al. 2010]
Diana Santos, Nuno Cardoso & Luís Miguel Cabral. "How geographic was GikiCLEF? A GIR-critical review". (FCUL, Lisboa, 26 de Janeiro de 2010). Slides
[Santos et al. 2010]
Diana Santos, Nuno Cardoso & Luís Miguel Cabral. "How geographical was GikiCLEF? A GIR-critical review". In 6th Workshop on Geographic Information Retrieval (GIR'10) (Zurique, 18-19 Fevereiro). http://www.linguateca.pt/Diana/download/SantosCardosoCabralGIR2010.pdf
[Santos et al. 2010]
Diana Santos, Luís Miguel Cabral, Pamela Forner, Corina Forascu, Fredric Gey, Katrin Lamm, Thomas Mandl, Petya Osenova, Anselmo Peñas, Alvaro Rodrigo, Julia Schulz, Yvonne Skalban, Erik Tjong Kim Sang & Nuno Cardoso. "GikiCLEF: Crosscultural issues in multilingual information access". Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) (Valletta, Malta, 17-23 May de 2010). Poster
[Santos et al. 2010]
Diana Santos, Luís Miguel Cabral, Corina Forascu, Pamela Forner, Fredric Gey, Katrin Lamm, Thomas Mandl, Petya Osenova, Anselmo Peñas, Alvaro Rodrigo, Julia Schulz, Yvonne Skalban & Erik Tjong Kim Sang. "GikiCLEF: Crosscultural issues in multilingual information access". In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner & Daniel Tapias (eds.), Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) (Valletta, Malta, 17-23 May de 2010), European Language Resources Association, pp. 2346-2353. http://www.linguateca.pt/Diana/download/SantosetalGikiCLEF.pdf
[Cardoso 2010]
Nuno Cardoso. "GikiCLEF topics and Wikipedia articles: Did they blend?". In Carol Peters, Giorgio Di Nunzio, Mikko Kurimo, Thomas Mandl, Djamel Mostefa, Anselmo Peñas & Giovanna Roda (eds.), Multilingual Information Access Evaluation, VOL I Setembro de 2010, Springer.
[Santos & Cabral 2010]
Diana Santos & Luís Miguel Cabral. "GikiCLEF : Expectations and lessons learned". In Carol Peters, Giorgio Di Nunzio, Mikko Kurimo, Thomas Mandl, Djamel Mostefa, Anselmo Peñas & Giovanna Roda (eds.), Multilingual Information Access Evaluation, VOL I Setembro de 2010, Springer, pp. 212-222. http://www.linguateca.pt/Diana/download/SantosCabralSpringer2010.pdf
[Costa et al. 2012]
Luís Costa, Cristina Mota, Diana Santos, Luís Costa, Cristina Mota & Diana Santos. "SIGA, a System to Manage Information Retrieval Evaluations". In Computational processing of the Portuguese language (PROPOR2012) (Coimbra, Abril de 2012), pp. 173-184. http://www.linguateca.pt/Diana/download/CostaetalPROPOR2012.pdf
[Mota et al. 2012]
Cristina Mota, Alberto Simões, Cláudia Freitas, Luís Costa & Diana Santos. "Págico: Evaluating Wikipedia-based information retrieval in Portuguese". In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U?ur Do?an, Bente Maegaard, Joseph Mariani, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Eigth International Conference on Language Resources and Evaluation (LREC'12) (Istambul, 23-25 de Maio de 2012), pp. 2015-2022. pdf poster pdf