[Main Page]

GikiCLEF - Cross-language Geographic Information Retrieval from Wikipedia

(Difference between revisions)



(GikiCLEF 2009: Cross-language Geographic Information Retrieval from Wikipedia)
Line 1: Line 1:
== GikiCLEF 2009: Cross-language Geographic Information Retrieval from Wikipedia ==
== GikiCLEF 2009: Cross-language Geographic Information Retrieval from Wikipedia ==
-
The GikiCLEF 2009 is an evaluation task for the Question Answering track for the [http://www.clef-campaign.org/ CLEF 2009 campaign], succeeding the [[#GikiP 2008 pilot task|GikiP 2008 pilot task]]. The task is being co-organized by (the list is in alphabetical order by last name, and it is not yet complete):
+
The GikiCLEF 2009 is an evaluation task for the [http://www.clef-campaign.org/ CLEF 2009 campaign], succeeding the [[#GikiP 2008 pilot task|GikiP 2008 pilot task]]. The task is being co-organized by (the list is in alphabetical order by last name, and it is not yet complete):
* Gosse Bouma
* Gosse Bouma
* Nuno Cardoso (main)
* Nuno Cardoso (main)
Line 13: Line 13:
* Yvonne Skalban
* Yvonne Skalban
-
Are you interested in participate in GikiCLEF 2009? Join the [mailto:gikiclef@linguateca.pt mailing-list].
+
=== Call for participation ===
 +
 
 +
We see GikiCLEF as a joint-evaluation task, where all participants may contribute to improve the task and suit all their needs. We are currently asking for all participants to [[mailto:gikiclef@linguateca.pt join the mailing list]].
=== GikiCLEF 2009 task description ===
=== GikiCLEF 2009 task description ===
-
The GikiCLEF task description is the same as the GikiP's pilot task, and it is the following:
+
The GikiCLEF evaluation task intends to evaluate systems on finding Wikipedia entries / documents that answer a particular information need which requires geographical reasoning of some sort.
-
 
+
GikiCLEF participants must build systems that are capable of answering a group of geographically challenging topics, using Wikipedia collections and returning a list of document URIs that contain the correct answers for each topic (this is an open subject, as the topics might require, for example, an ordered list of answers ordered by date, location or a given storyline). Examples of the GikiP 2008 topics include:
-
''Find Wikipedia entries / documents that answer a particular information need which requires geographical reasoning of some sort.''
+
-
 
+
-
 
+
-
The GikiCLEF participants must build systems that are capable of answering a group of geographically challenging topics, using the Wikipedia collection(s)from the QA@CLEF main track and returning the URIs of the documents that contain the correct answers for each topic (information on how to get them is provided on CLEF registration). Examples of the GikiP 2008 topics include:
+
# Which African capital have more than two million inhabitants?
# Which African capital have more than two million inhabitants?
Line 29: Line 27:
# What wars occurred in Greek soil?
# What wars occurred in Greek soil?
-
=== Call for participation ===
+
The GikiCLEF open issues are the following:
-
 
+
-
We see GikiCLEF as a joint-evaluation task, where all participants can contribute in order to improve the task and suit all their needs. We are currently asking for all participants to join the mailing list and take and active part and make suggestions for the GikiCLEF task. The topics we want to address are the following:
+
==== Languages ====
==== Languages ====
-
'''English''', '''German''' and '''Portuguese''' were used in GikiP 2008, and should again be used in GikiCLEF 2009. We plan to add '''Norwegian''', '''Romanian''' and '''Dutch''' for the GikiCLEF 2009. In order to add a language, we need to ensure that there is someone who has it as a mother tongue, and that is available for topic translation and results assessments for the Wikipedia snapshot for that language.
+
In CikiCLEF 2009, we will use '''English''', '''German''', '''Portuguese''', '''Norwegian''', '''Romanian''' and '''Dutch''' languages for topics and collections. Other languages can be suggested by participants; if you want to add another language, please contact the GikiCLEF organizers.
==== Collection ====
==== Collection ====
-
Wikipedia releases periodic snapshots of the static pages and SQL databases. Since the system results is a simple list of URLs, we can use any up-to-date Wikipedia snapshots as the collection. This has the advantage that it makes even more easy to add new languages. We could also use some hosting space to store some of the Wikipedia snapshots, to ensure that the collection used in GikiCLEF is always available and unambiguously defined.
 
-
The organizers will take care of getting the Wikipedia static dumps from all languages at stake in GikiCLEF, and making them available to all GikiCLEF participants as soon as possible, to give time for pre-processing and test the systems with the new collections.  
+
The organizers will take care of making the Wikipedia static dumps from all GikiCLEF languages available for pre-processing and test the systems with the new collections. We intend to release the collections until the end of 2008.  
-
Another topic of discussion is whether we should pre-process the collections with the [http://ilps.science.uva.nl/WikiXML/ WikiXML tool], in order to provide a collection in the same format as GikiP's collection.
+
Another topic of discussion is whether we should pre-process the collections with the [http://ilps.science.uva.nl/WikiXML/ WikiXML tool], in order to provide a collection in the same format as GikiP's collection, and if we could use some DBpedia data.
==== Topics ====
==== Topics ====
-
The 15 topics of GikiP 2008 were created by the organizers, and they have a degree of geographical complexity and requires some sort of geographic reasoning capabilities from the systems (as a GeoCLEF-motivated pilot task) and are formulated in the form of questions (as a QA-flavored task). The main goal is to create topics that are as close as we can to a true information need, and as well described as possible in natural language. The topics should span several types as as those discussed in [[#References|Gey et al. (2006)]], given that these facts are bound to be joined in entries about relevant subjects.
+
Our goal is to create topics that are as close as possible to a true information need, and well described in natural language. The topics should span several kinds and be from different cultures so that they have different coverages .
We are currently discussing the moulds on topic creation, number of topics and topic difficulty (for instance, whether there will be some anaphoric questions or not), and whether there should be a contribution from the participants to an initial pool of topics, to somewhat blend in the kinds of questions that they are currently tackling with their research works.
We are currently discussing the moulds on topic creation, number of topics and topic difficulty (for instance, whether there will be some anaphoric questions or not), and whether there should be a contribution from the participants to an initial pool of topics, to somewhat blend in the kinds of questions that they are currently tackling with their research works.

Revision as of 14:35, 9 October 2008

Contents

GikiCLEF 2009: Cross-language Geographic Information Retrieval from Wikipedia

The GikiCLEF 2009 is an evaluation task for the CLEF 2009 campaign, succeeding the GikiP 2008 pilot task. The task is being co-organized by (the list is in alphabetical order by last name, and it is not yet complete):

Call for participation

We see GikiCLEF as a joint-evaluation task, where all participants may contribute to improve the task and suit all their needs. We are currently asking for all participants to [join the mailing list].

GikiCLEF 2009 task description

The GikiCLEF evaluation task intends to evaluate systems on finding Wikipedia entries / documents that answer a particular information need which requires geographical reasoning of some sort.

GikiCLEF participants must build systems that are capable of answering a group of geographically challenging topics, using Wikipedia collections and returning a list of document URIs that contain the correct answers for each topic (this is an open subject, as the topics might require, for example, an ordered list of answers ordered by date, location or a given storyline). Examples of the GikiP 2008 topics include:

  1. Which African capital have more than two million inhabitants?
  2. List places where Goethe lived.
  3. What wars occurred in Greek soil?

The GikiCLEF open issues are the following:

Languages

In CikiCLEF 2009, we will use English, German, Portuguese, Norwegian, Romanian and Dutch languages for topics and collections. Other languages can be suggested by participants; if you want to add another language, please contact the GikiCLEF organizers.

Collection

The organizers will take care of making the Wikipedia static dumps from all GikiCLEF languages available for pre-processing and test the systems with the new collections. We intend to release the collections until the end of 2008.

Another topic of discussion is whether we should pre-process the collections with the WikiXML tool, in order to provide a collection in the same format as GikiP's collection, and if we could use some DBpedia data.

Topics

Our goal is to create topics that are as close as possible to a true information need, and well described in natural language. The topics should span several kinds and be from different cultures so that they have different coverages .

We are currently discussing the moulds on topic creation, number of topics and topic difficulty (for instance, whether there will be some anaphoric questions or not), and whether there should be a contribution from the participants to an initial pool of topics, to somewhat blend in the kinds of questions that they are currently tackling with their research works.

Evaluation

GikiCLEF accepts only answers / documents of the correct type are expected. For example, names of people (painters and scientists), names of countries (not of wars or kings), etc. The system's results in GikiP were evaluated according to number of correct hits (N) and precision, by the simple formula mult*N*N/total, for each topic, where mult rewards multilinguality. The system's final score was given by the average of the individual scores.

For the GikiCLEF 2009, we need to develop new measures to evaluate the performance of the systems. We want all participants to contribute on the development of such measures, in a way that it encourages multilinguality and diversity of answers.

Important dates

We aim to an early topic development and release, and also to an early submission deadline compared to the other CLEF tracks, to avoid the 'rush-months' of CLEF tracks. The final dates are still being decided among the organizers.

  1. 1 October 2008 - GikiCLEF mailing list open, call for participation and guideline discussion
  2. 28-30 October 2008 - Promoting GikiCLEF on the GIR workshop held at CIKM 2008, Napa Valley, CA, EUA.
  3. November 2008 - January 2009 - Discussion among participants and organizers on the GikiCLEF evaluation task methodology.
  4. January - February 2009 - Final definition of the GikiCLEF task. Publication of the details of the task. Topic Release.
  5. March - August 2009 - Submission of the results. Release of the results and the assessments. GikiCLEF paper submission for the CLEF 2009 working notes.
  6. September 2009 - CLEF workshop at Corfu, Greece.

GikiP 2008 pilot task

The GikiP task was was accepted as a pilot task for the GeoCLEF 2008 main track. The GikiP organization (including topic development, assessments and evaluation of the results) was made by Linguateca.

GikiP's overview paper on the Working notes of CLEF 2008 details the pilot task and the experiments made by the three participants: i) Johannes Leveling and Sven Hartrumpf, from the University of Hagen, Germany (Presentation in PDF), ii) Iustin Dornescu, from the University of Wolverhampton, UK (Presentation in PDF), and iii) Nuno Cardoso, from the University of Lisbon, Portugal (Presentation in PDF).

In the main page of GikiP 2008 you can find the 15 topics used in English, German and Portuguese, the assessments and the results achieved by the participants.

Acknowledgements

So far GikiCLEF is being funded by Linguateca, jointly funded by the Portuguese Government and the European Union (FEDER and FSE) under contract ref. POSC/339/1.3/C/NAC.

References