[Main Page]

GikiCLEF - Cross-language Geographic Information Retrieval from Wikipedia

(Difference between revisions)



Line 3: Line 3:
=== Introduction ===
=== Introduction ===
-
The GikiCLEF 2009 is an evaluation task for the Question Answering track for the [http://www.clef-campaign.org CLEF 2009 campaign], succeeding the GikiP 2008 pilot task and following its main guidelines. The task is being co-organized by: (the list is not yet complete)
+
The GikiCLEF 2009 is an evaluation task for the Question Answering track for the [http://www.clef-campaign.org/ CLEF 2009 campaign], succeeding the [http://www.linguateca.pt/GikiP GikiP 2008 pilot task] and following its main guidelines. The task is being co-organized by (the list is not yet complete):
-
    * Nuno Cardoso (ncardoso@xldb.di.fc.ul.pt)
+
* Nuno Cardoso (ncardoso@xldb.di.fc.ul.pt)
-
    * Paula Carvalho (pcc@xldb.di.fc.ul.pt).
+
* Paula Carvalho (pcc@xldb.di.fc.ul.pt).
-
    * Diana Santos.
+
* Diana Santos.
-
    * Iustin Dornescu.
+
* Iustin Dornescu.
-
    * Corina Forascu.
+
* Corina Forascu.
-
    * Johannes Leveling.
+
* Johannes Leveling.
-
    * Gosse Bouma.  
+
* Gosse Bouma.  
-
2. Overview of GikiP 2008 pilot task
+
=== Overview of GikiP 2008 pilot task ===
-
The GikiP task was was accepted as a pilot task for the GeoCLEF 2008 main track. The GikiP organization (including topic development, assessments and evaluation of the results) was made by Linguateca.
+
-
GikiP's overview paper on the Working notes of CLEF 2008 details the pilot task and the experiments made by the three participants: i) Johannes Leveling and Sven Hartrumpf, from the University of Hagen, Germany (Presentation in PDF), ii) Iustin Dornescu, from the University of Wolverhampton, UK (Presentation in PDF), and iii) Nuno Cardoso, from the University of Lisbon, Portugal (Presentation in PDF).
+
The GikiP task was was accepted as a pilot task for the [http://www.uni-hildesheim.de/geoclef/ GeoCLEF 2008 main track]. The GikiP organization (including topic development, assessments and evaluation of the results) was made by [http://www.linguateca.pt Linguateca].
 +
 
 +
[[#References|GikiP's overview paper]] on the Working notes of CLEF 2008 details the pilot task and the experiments made by the three participants: i) Johannes Leveling and Sven Hartrumpf, from the University of Hagen, Germany [[#References|Presentation in PDF]], ii) Iustin Dornescu, from the University of Wolverhampton, UK [[#References|Presentation in PDF]], and iii) Nuno Cardoso, from the University of Lisbon, Portugal [[#References|Presentation in PDF]].
 +
 
 +
In the [http://www.linguateca.pt/GikiP/ main page of GikiP 2008] you can find the 15 topics used in [http://acdc.linguateca.pt/aval_conjunta/CLEF/GeoCLEF/GikiP2008_en.xml English], [http://acdc.linguateca.pt/aval_conjunta/CLEF/GeoCLEF/GikiP2008_de.xml German] and [http://acdc.linguateca.pt/aval_conjunta/CLEF/GeoCLEF/GikiP2008_pt.xml Portuguese], the [http://lusiadas.linguateca.pt/GikiP/julgamentos.html assessments] and the [http://lusiadas.linguateca.pt/GikiP/results.html results] achieved by the participants.
 +
 
 +
=== GikiCLEF 2009 task description ===
-
In the main page of GikiP 2008 you can find the 15 topics used in English, German and Portuguese, the assessments and the results achieved by the participants.
 
-
3. GikiCLEF 2009 task description
 
The GikiCLEF task description is the same as the GikiP's pilot task, and it is the following:
The GikiCLEF task description is the same as the GikiP's pilot task, and it is the following:
-
     Find Wikipedia entries / documents that answer a particular information need which requires geographical reasoning of some sort.  
+
     ''Find Wikipedia entries / documents that answer a particular information need which requires geographical reasoning of some sort.''
The GikiCLEF participants must build systems that are capable of answering a group of geographically challenging topics, using the Wikipedia collection(s)from the QA@CLEF main track and returning the URIs of the documents that contain the correct answers for each topic (information on how to get them is provided on CLEF registration). Examples of the GikiP 2008 topics include:
The GikiCLEF participants must build systems that are capable of answering a group of geographically challenging topics, using the Wikipedia collection(s)from the QA@CLEF main track and returning the URIs of the documents that contain the correct answers for each topic (information on how to get them is provided on CLEF registration). Examples of the GikiP 2008 topics include:
-
  1. Which African capital have more than two million inhabitants?
+
# Which African capital have more than two million inhabitants?
-
  2. List places where Goethe lived.
+
# List places where Goethe lived.
-
  3. What wars occurred in Greek soil?  
+
# What wars occurred in Greek soil?  
 +
 
 +
=== Call for participation ===
-
4. Call for participation
 
We see GikiCLEF as a joint-evaluation task, where all participants can contribute in order to improve the task and suit all their needs. We are currently asking for all participants to join the mailing list and take and active part and make suggestions for the GikiCLEF task. The topics we want to address are the following:
We see GikiCLEF as a joint-evaluation task, where all participants can contribute in order to improve the task and suit all their needs. We are currently asking for all participants to join the mailing list and take and active part and make suggestions for the GikiCLEF task. The topics we want to address are the following:
-
Languages
+
 
 +
==== Languages ====
English, German and Portuguese were used in 2008, and should again be used in GikiCLEF 2009. Maybe Dutch could be used as well? In order to add a language, we need to ensure that there is someone who has it as a mother tongue, and that is available for topic translation and results assessments for the Wikipedia snapshot for that language.
English, German and Portuguese were used in 2008, and should again be used in GikiCLEF 2009. Maybe Dutch could be used as well? In order to add a language, we need to ensure that there is someone who has it as a mother tongue, and that is available for topic translation and results assessments for the Wikipedia snapshot for that language.
-
Collection
+
 
-
We are currently using the WiQA collection, but since Wikipedia releases periodic snapshots of the static pages and SQL databases, and the results are a simple list of URLs, we can use more up-to-date Wikipedia snapshots as the collection. This has the advantage that it makes even more easy to add new languages.
+
==== Collection ====
 +
We target the Wikipedia releases periodic snapshots of the static pages and SQL databases, and the results are a simple list of URLs, we can use more up-to-date Wikipedia snapshots as the collection. This has the advantage that it makes even more easy to add new languages.
We could also use some hosting space to store some of the Wikipedia snapshots, to ensure that the collection used in GikiCLEF is always available and unambiguously defined.
We could also use some hosting space to store some of the Wikipedia snapshots, to ensure that the collection used in GikiCLEF is always available and unambiguously defined.
-
Topics
+
 
-
The 15 topics of GikiP 2008 were created by the organizers, and they have a degree of geographical complexity and requires some sort of geographic reasoning capabilities from the systems (as a GeoCLEF-motivated pilot task) and are formulated in the form of questions (as a QA-flavored task). The main goal is to create topics that are as close as we can to a true information need, and as well described as possible in natural language. The topics should span several types as as those discussed in Gey et al. (2006), given that these facts are bound to be joined in entries about relevant subjects.
+
==== Topics ====
 +
 
 +
The 15 topics of GikiP 2008 were created by the organizers, and they have a degree of geographical complexity and requires some sort of geographic reasoning capabilities from the systems (as a GeoCLEF-motivated pilot task) and are formulated in the form of questions (as a QA-flavored task). The main goal is to create topics that are as close as we can to a true information need, and as well described as possible in natural language. The topics should span several types as as those discussed in [#References|Gey et al. (2006)], given that these facts are bound to be joined in entries about relevant subjects.
We are opening the discussion on the total number of topics, and whether they should be formulated by the organizers, or all participants should contribute to a pool of topics, and somewhat represent the kinds of questions that they are currently tackling with their research works.
We are opening the discussion on the total number of topics, and whether they should be formulated by the organizers, or all participants should contribute to a pool of topics, and somewhat represent the kinds of questions that they are currently tackling with their research works.
-
Evaluation
+
==== Evaluation ====
GikiCLEF accepts only answers / documents of the correct type are expected. For example, names of people (painters and scientists), names of countries (not of wars or kings), etc. The system's results in GikiP were evaluated according to number of correct hits (N) and precision, by the simple formula mult*N*N/total, for each topic, where mult rewards multilinguality. The system's final score will be given by the average of the individual scoers.
GikiCLEF accepts only answers / documents of the correct type are expected. For example, names of people (painters and scientists), names of countries (not of wars or kings), etc. The system's results in GikiP were evaluated according to number of correct hits (N) and precision, by the simple formula mult*N*N/total, for each topic, where mult rewards multilinguality. The system's final score will be given by the average of the individual scoers.
-
5. Important dates
 
-
  1. 1 October 2008 - GikiCLEF mailing list open, call for participation and guideline discussion
+
=== Important dates ===
-
  2. 28-30 October 2008 - Promoting GikiCLEF on the GIR workshop held at CIKM 2008, Napa Valley, CA, EUA.
+
 
-
  3. November 2008-February 2009 - Discussion among participants and organizers on the GikiCLEF evaluation moulds.
+
# 1 October 2008 - GikiCLEF mailing list open, call for participation and guideline discussion
-
  4. March 2009 - Final definition of the GikiCLEF task. Publication of the details of the task.
+
# 28-30 October 2008 - Promoting GikiCLEF on the GIR workshop held at CIKM 2008, Napa Valley, CA, EUA.
-
  5. May 2009 - Topic Release.
+
# November 2008-February 2009 - Discussion among participants and organizers on the GikiCLEF evaluation moulds.
-
  6. June 2009 - Submission of the results.
+
# March 2009 - Final definition of the GikiCLEF task. Publication of the details of the task.
-
  7. July 2009 - Release of the results and the assessments.
+
# May 2009 - Topic Release.
-
  8. August 2009 - GikiCLEF paper submission for the CLEF 2009 working notes.
+
# June 2009 - Submission of the results.
-
  9. September 2009 - CLEF workshop at Corfu, Greece  
+
# July 2009 - Release of the results and the assessments.
 +
# August 2009 - GikiCLEF paper submission for the CLEF 2009 working notes.
 +
# September 2009 - CLEF workshop at Corfu, Greece  
-
Acknowledgements
+
==== Acknowledgements ====
-
References
+
-
    * Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "Getting geographical answers from Wikipedia: the GikiP pilot at CLEF". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), CLEF 2008 Working notes (Aarhus, 17-19 September 2008). Working notes PDF, Local copy PDF.
+
==== References ====
-
    * Diana Santos, Nuno Cardoso, Paula Carvalho, Yvonne Skalban, Iustin Dornescu, Johannes Leveling & Sven Hartrumpf. Getting geographical answers from Wikipedia: the GikiP pilot at CLEF (PDF)
+
* Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "Getting geographical answers from Wikipedia: the GikiP pilot at CLEF". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), CLEF 2008 Working notes (Aarhus, 17-19 September 2008). Working notes PDF, Local copy PDF.
-
    * Johannes Leveling & Sven Hartrumpf. A fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP (PDF)
+
* Diana Santos, Nuno Cardoso, Paula Carvalho, Yvonne Skalban, Iustin Dornescu, Johannes Leveling & Sven Hartrumpf. Getting geographical answers from Wikipedia: the GikiP pilot at CLEF (PDF)
-
    * Iustin Dornescu. Digging for information WikipediaQAList@wlv at GikiP (PDF)
+
* Johannes Leveling & Sven Hartrumpf. A fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP (PDF)
-
    * Nuno Cardoso. Towards semantic flavored queries for GIR systems: RENOIR at the GikiP pilot task (PDF)
+
* Iustin Dornescu. Digging for information WikipediaQAList@wlv at GikiP (PDF)
-
    * (Gey et al, 2006) Fredric Gey, Ray Larson, Mark Sanderson, Kerstin Bischoff, Thomas Mandl, Christa Womser-Hacker, Diana Santos, Paulo Rocha, Andres Montoyo, Giorgio M. Di Nunzio & Nicola Ferro. Challenges to Evaluation of Multilingual Geographic Information Retrieval in GeoCLEF. In Workshop on Evaluation of Information Access (EVIA) May 15 (Tokyo, Japan, Maio 15 2007 ), s/pp.
+
* Nuno Cardoso. Towards semantic flavored queries for GIR systems: RENOIR at the GikiP pilot task (PDF)
 +
* (Gey et al, 2006) Fredric Gey, Ray Larson, Mark Sanderson, Kerstin Bischoff, Thomas Mandl, Christa Womser-Hacker, Diana Santos, Paulo Rocha, Andres Montoyo, Giorgio M. Di Nunzio & Nicola Ferro. Challenges to Evaluation of Multilingual Geographic Information Retrieval in GeoCLEF. In Workshop on Evaluation of Information Access (EVIA) May 15 (Tokyo, Japan, Maio 15 2007 ), s/pp.

Revision as of 14:25, 29 September 2008

Contents

GikiCLEF 2009: Crosslingual geographic information retrieval from Wikipedia

Introduction

The GikiCLEF 2009 is an evaluation task for the Question Answering track for the CLEF 2009 campaign, succeeding the GikiP 2008 pilot task and following its main guidelines. The task is being co-organized by (the list is not yet complete):

Overview of GikiP 2008 pilot task

The GikiP task was was accepted as a pilot task for the GeoCLEF 2008 main track. The GikiP organization (including topic development, assessments and evaluation of the results) was made by Linguateca.

GikiP's overview paper on the Working notes of CLEF 2008 details the pilot task and the experiments made by the three participants: i) Johannes Leveling and Sven Hartrumpf, from the University of Hagen, Germany Presentation in PDF, ii) Iustin Dornescu, from the University of Wolverhampton, UK Presentation in PDF, and iii) Nuno Cardoso, from the University of Lisbon, Portugal Presentation in PDF.

In the main page of GikiP 2008 you can find the 15 topics used in English, German and Portuguese, the assessments and the results achieved by the participants.

GikiCLEF 2009 task description

The GikiCLEF task description is the same as the GikiP's pilot task, and it is the following:

   Find Wikipedia entries / documents that answer a particular information need which requires geographical reasoning of some sort. 

The GikiCLEF participants must build systems that are capable of answering a group of geographically challenging topics, using the Wikipedia collection(s)from the QA@CLEF main track and returning the URIs of the documents that contain the correct answers for each topic (information on how to get them is provided on CLEF registration). Examples of the GikiP 2008 topics include:

  1. Which African capital have more than two million inhabitants?
  2. List places where Goethe lived.
  3. What wars occurred in Greek soil?

Call for participation

We see GikiCLEF as a joint-evaluation task, where all participants can contribute in order to improve the task and suit all their needs. We are currently asking for all participants to join the mailing list and take and active part and make suggestions for the GikiCLEF task. The topics we want to address are the following:

Languages

English, German and Portuguese were used in 2008, and should again be used in GikiCLEF 2009. Maybe Dutch could be used as well? In order to add a language, we need to ensure that there is someone who has it as a mother tongue, and that is available for topic translation and results assessments for the Wikipedia snapshot for that language.

Collection

We target the Wikipedia releases periodic snapshots of the static pages and SQL databases, and the results are a simple list of URLs, we can use more up-to-date Wikipedia snapshots as the collection. This has the advantage that it makes even more easy to add new languages. We could also use some hosting space to store some of the Wikipedia snapshots, to ensure that the collection used in GikiCLEF is always available and unambiguously defined.

Topics

The 15 topics of GikiP 2008 were created by the organizers, and they have a degree of geographical complexity and requires some sort of geographic reasoning capabilities from the systems (as a GeoCLEF-motivated pilot task) and are formulated in the form of questions (as a QA-flavored task). The main goal is to create topics that are as close as we can to a true information need, and as well described as possible in natural language. The topics should span several types as as those discussed in [#References|Gey et al. (2006)], given that these facts are bound to be joined in entries about relevant subjects.

We are opening the discussion on the total number of topics, and whether they should be formulated by the organizers, or all participants should contribute to a pool of topics, and somewhat represent the kinds of questions that they are currently tackling with their research works.

Evaluation

GikiCLEF accepts only answers / documents of the correct type are expected. For example, names of people (painters and scientists), names of countries (not of wars or kings), etc. The system's results in GikiP were evaluated according to number of correct hits (N) and precision, by the simple formula mult*N*N/total, for each topic, where mult rewards multilinguality. The system's final score will be given by the average of the individual scoers.

Important dates

  1. 1 October 2008 - GikiCLEF mailing list open, call for participation and guideline discussion
  2. 28-30 October 2008 - Promoting GikiCLEF on the GIR workshop held at CIKM 2008, Napa Valley, CA, EUA.
  3. November 2008-February 2009 - Discussion among participants and organizers on the GikiCLEF evaluation moulds.
  4. March 2009 - Final definition of the GikiCLEF task. Publication of the details of the task.
  5. May 2009 - Topic Release.
  6. June 2009 - Submission of the results.
  7. July 2009 - Release of the results and the assessments.
  8. August 2009 - GikiCLEF paper submission for the CLEF 2009 working notes.
  9. September 2009 - CLEF workshop at Corfu, Greece

Acknowledgements

References