[Main Page]

GikiCLEF - Cross-language Geographic Information Retrieval from Wikipedia

(Difference between revisions)



(Overview of GikiP 2008 pilot task)
(References)
Line 65: Line 65:
==== References ====
==== References ====
-
* Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "Getting geographical answers from Wikipedia: the GikiP pilot at CLEF". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), CLEF 2008 Working notes (Aarhus, 17-19 September 2008). Working notes PDF, Local copy PDF.
+
* Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "Getting geographical answers from Wikipedia: the GikiP pilot at CLEF". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), CLEF 2008 Working notes (Aarhus, 17-19 September 2008). [http://clef.isti.cnr.it/2008/working_notes/Santos-paperCLEF2008.pdf Working notes PDF], [http://acdc.linguateca.pt/aval_conjunta/CLEF/GeoCLEF/GikiP2008/SantosetalWNCLEF2008.pdf Local copy PDF].
-
* Diana Santos, Nuno Cardoso, Paula Carvalho, Yvonne Skalban, Iustin Dornescu, Johannes Leveling & Sven Hartrumpf. Getting geographical answers from Wikipedia: the GikiP pilot at CLEF (PDF)
+
* Diana Santos, Nuno Cardoso, Paula Carvalho, Yvonne Skalban, Iustin Dornescu, Johannes Leveling & Sven Hartrumpf. [http://acdc.linguateca.pt/aval_conjunta/CLEF/GeoCLEF/GikiP2008/SantosGikiPworkshopCLEF2008.pdf Getting geographical answers from Wikipedia: the GikiP pilot at CLEF]. In ''Working Notes of CLEF 2008'', Århus, Denmark, 19-21 September 2008.
-
* Johannes Leveling & Sven Hartrumpf. A fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP (PDF)
+
* Johannes Leveling & Sven Hartrumpf. [http://acdc.linguateca.pt/aval_conjunta/CLEF/GeoCLEF/GikiP2008/Leveling_gikip_pres.pdf A fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP]. In ''Working Notes of CLEF 2008'', Århus, Denmark, 19-21 September 2008.
-
* Iustin Dornescu. Digging for information WikipediaQAList@wlv at GikiP (PDF)
+
* Iustin Dornescu. [http://acdc.linguateca.pt/aval_conjunta/CLEF/GeoCLEF/GikiP2008/idornescuGikiP2008.pdf Digging for information WikipediaQAList@wlv at GikiP]. In ''Working Notes of CLEF 2008'', Århus, Denmark, 19-21 September 2008.
-
* Nuno Cardoso. Towards semantic flavored queries for GIR systems: RENOIR at the GikiP pilot task (PDF)
+
* Nuno Cardoso. [http://acdc.linguateca.pt/aval_conjunta/CLEF/GeoCLEF/GikiP2008/CardosoGIKIP2008.pdf Towards semantic flavored queries for GIR systems: RENOIR at the GikiP pilot task].  In ''Working Notes of CLEF 2008'', Århus, Denmark, 19-21 September 2008.
-
* (Gey et al, 2006) Fredric Gey, Ray Larson, Mark Sanderson, Kerstin Bischoff, Thomas Mandl, Christa Womser-Hacker, Diana Santos, Paulo Rocha, Andres Montoyo, Giorgio M. Di Nunzio & Nicola Ferro. Challenges to Evaluation of Multilingual Geographic Information Retrieval in GeoCLEF. In Workshop on Evaluation of Information Access (EVIA) May 15 (Tokyo, Japan, Maio 15 2007 ), s/pp.
+
* Fredric Gey, Ray Larson, Mark Sanderson, Kerstin Bischoff, Thomas Mandl, Christa Womser-Hacker, Diana Santos, Paulo Rocha, Andres Montoyo, Giorgio M. Di Nunzio & Nicola Ferro. [http://www.linguateca.pt/Diana/download/GeyetalEVIA2007.pdf Challenges to Evaluation of Multilingual Geographic Information Retrieval in GeoCLEF]. In Workshop on Evaluation of Information Access (EVIA) May 15 (Tokyo, Japan, Maio 15 2007 ), s/pp.
 +
* Diana Santos & Nuno Cardoso. "GikiP: Evaluating geographical answers from Wikipedia". In 5th Workshop on Geographic Information Retrieval (GIR'08) (Napa Valley, CA, USA, November 1 2008).

Revision as of 14:31, 29 September 2008

Contents

GikiCLEF 2009: Crosslingual geographic information retrieval from Wikipedia

The GikiCLEF 2009 is an evaluation task for the Question Answering track for the CLEF 2009 campaign, succeeding the GikiP 2008 pilot task and following its main guidelines. The task is being co-organized by (the list is not yet complete):

Overview of GikiP 2008 pilot task

The GikiP task was was accepted as a pilot task for the GeoCLEF 2008 main track. The GikiP organization (including topic development, assessments and evaluation of the results) was made by Linguateca.

GikiP's overview paper on the Working notes of CLEF 2008 details the pilot task and the experiments made by the three participants: i) Johannes Leveling and Sven Hartrumpf, from the University of Hagen, Germany (Presentation in PDF), ii) Iustin Dornescu, from the University of Wolverhampton, UK (Presentation in PDF), and iii) Nuno Cardoso, from the University of Lisbon, Portugal (Presentation in PDF).

In the main page of GikiP 2008 you can find the 15 topics used in English, German and Portuguese, the assessments and the results achieved by the participants.

GikiCLEF 2009 task description

The GikiCLEF task description is the same as the GikiP's pilot task, and it is the following:

   Find Wikipedia entries / documents that answer a particular information need which requires geographical reasoning of some sort. 

The GikiCLEF participants must build systems that are capable of answering a group of geographically challenging topics, using the Wikipedia collection(s)from the QA@CLEF main track and returning the URIs of the documents that contain the correct answers for each topic (information on how to get them is provided on CLEF registration). Examples of the GikiP 2008 topics include:

  1. Which African capital have more than two million inhabitants?
  2. List places where Goethe lived.
  3. What wars occurred in Greek soil?

Call for participation

We see GikiCLEF as a joint-evaluation task, where all participants can contribute in order to improve the task and suit all their needs. We are currently asking for all participants to join the mailing list and take and active part and make suggestions for the GikiCLEF task. The topics we want to address are the following:

Languages

English, German and Portuguese were used in 2008, and should again be used in GikiCLEF 2009. Maybe Dutch could be used as well? In order to add a language, we need to ensure that there is someone who has it as a mother tongue, and that is available for topic translation and results assessments for the Wikipedia snapshot for that language.

Collection

We target the Wikipedia releases periodic snapshots of the static pages and SQL databases, and the results are a simple list of URLs, we can use more up-to-date Wikipedia snapshots as the collection. This has the advantage that it makes even more easy to add new languages. We could also use some hosting space to store some of the Wikipedia snapshots, to ensure that the collection used in GikiCLEF is always available and unambiguously defined.

Topics

The 15 topics of GikiP 2008 were created by the organizers, and they have a degree of geographical complexity and requires some sort of geographic reasoning capabilities from the systems (as a GeoCLEF-motivated pilot task) and are formulated in the form of questions (as a QA-flavored task). The main goal is to create topics that are as close as we can to a true information need, and as well described as possible in natural language. The topics should span several types as as those discussed in [#References|Gey et al. (2006)], given that these facts are bound to be joined in entries about relevant subjects.

We are opening the discussion on the total number of topics, and whether they should be formulated by the organizers, or all participants should contribute to a pool of topics, and somewhat represent the kinds of questions that they are currently tackling with their research works.

Evaluation

GikiCLEF accepts only answers / documents of the correct type are expected. For example, names of people (painters and scientists), names of countries (not of wars or kings), etc. The system's results in GikiP were evaluated according to number of correct hits (N) and precision, by the simple formula mult*N*N/total, for each topic, where mult rewards multilinguality. The system's final score will be given by the average of the individual scoers.

Important dates

  1. 1 October 2008 - GikiCLEF mailing list open, call for participation and guideline discussion
  2. 28-30 October 2008 - Promoting GikiCLEF on the GIR workshop held at CIKM 2008, Napa Valley, CA, EUA.
  3. November 2008-February 2009 - Discussion among participants and organizers on the GikiCLEF evaluation moulds.
  4. March 2009 - Final definition of the GikiCLEF task. Publication of the details of the task.
  5. May 2009 - Topic Release.
  6. June 2009 - Submission of the results.
  7. July 2009 - Release of the results and the assessments.
  8. August 2009 - GikiCLEF paper submission for the CLEF 2009 working notes.
  9. September 2009 - CLEF workshop at Corfu, Greece

Acknowledgements

References