Index of /GikiCLEF/GIRA

[ICO]NameLast modifiedSizeDescription

[DIR]Parent Directory  -
[DIR]collections/17-Mar-2009 10:01 -
[DIR]pool/27-Nov-2009 14:30 -
[DIR]programs/27-Nov-2009 13:49 -
[DIR]topics/27-Nov-2009 09:20 -

GIRA: GikiCLEF Resource package

This package can also be downloaded from http://www.linguateca.pt/GikiCLEF/GIRA1.0.zip, and includes the main resources created in the scope of GikiCLEF 2009 (except for the Wikipedia collections). 

GikiCLEF was an evaluation track run under the scope of CLEF, http://www.clef-campaign.org/. Systems should find Wikipedia entries / documents that answer a particular information need, which requires geographical reasoning of some sort. GikiCLEF was the successor of the GikiP 2008 pilot task which ran in 2008 under GeoCLEF. Further information can be found in http://www.linguateca.pt/GikiCLEF/.

Contents
1.	Collections (only in the Web version)
2.	Topics
3.	Pool
4.	Evaluation programs
5.	Results
6. 	Credits and ackowledgements
 
The zipped package has the folowing directory structure

topics/
pool/
programs/


Please cite this package as "GIRA: GikiCLEF Resource package", available from http://www.linguateca.pt/GikiCLEF/GIRA1.0.zip.

1. Collections
------------------------

The GikiCLEF Wikipedia collections (in 10 versions corresponding to nine languages) were obtained from wikipedia.org static server (both a HTML dump and a SQL Dump), and were prepared for GikiCLEF by Nuno Cardoso.

The dates of generation are as follows:

 	HTML dump date 	 	SQL dump date
bg 	7 June 2008 		13 June 2008
de 	1 July 2008 		7 June 2008
en 	19 to 21 June 2008	24 May and 14 July 2008 (*)
es 	17th June, 2008 	28 June 2008
it 	20th June, 2008 	26 June 2008
nl 	26th June, 2008 	9 June 2008
nn 	22nd June, 2008 	20 June 2008
no 	22nd June, 2008 	10 June 2008
pt 	28th June, 2008 	25 June 2008
ro 	25th June, 2008 	10 June 2008 

(*) The English category.sql.gz file is from 14 July, because the respective file for the English dump of the 24 May is missing from the Wikipedia servers. Since there are no dumps generated in June 2008, the 24 May dump was chosen.

A third XML format, postprocessed from the SQL collection with the use of WikiXML tool (http://ilps.science.uva.nl/WikiXML/), was also made available.

The Wikipedia collections can be obtained from http://www.linguateca.pt/GikiCLEF/collections/.


2. Topics
------------------------

In this directory can be found:

- 24 example topics in the subdirectory Examples
- 50 final topics in  the subdirectory GikiCLEF 2009, together with their clarification in English, that is, the use cases they were supposed to satisfy, in file GikiCLEF_2009_clarification.xml

For completeness, we have also stored some other files with detailed information on the creation and assessment of the GikiCLEF topics, available from the main site, under the subdirectory documentation:

- guidelines_creation.txt 
- guidelines_assessment.txt

3. Pool
------------------------
In this directory the pool of assessed answers in GikiCLEF 2009 can be found

- GikiCLEF_answers_assessment.txt 

Due to the complex evaluation procedure that required that an answer should only be scored as correct if it was justified in any of the languages the system provided results in, the same answer could count as correct or not for different systems.

Therefore we list all correct answers found by the assessors or stored by the topic creators, with two columns, with 0 or 1: the first column indicates whether the answer is correct (1) or not (0), the second whether it is justified.

We have also provided two simple lists

- GikiCLEF_answers_correct_justified.txt
- GikiCLEF_answers_correct.txt 

as well as all Wikipedia files included in the pool, listed in docs_ids.txt, corresponding to 9318 documents, included in pool/GikiCLEF2009DocumentPool in the pzipped package, or available as an archive in http://www.linguateca.pt/GikiCLEF/GIRA/pool/GikiCLEF2009DocumentPoolSIGA.tgz

4. Evaluation programs
---------------------------
4.1 Program description

This directory contains the following programs:
  - Wikitool is a Perl script for generating the XML collection. Further documentation on its use can be found in http://meta.wikimedia.org/wiki/Wikitool
  - SIGA-1.1 (SIstema de Gestão e Avaliação do GIKICLEFis ), a cooperative web system that allows the management and assessment of an evaluation contest of the GikiCLEF kind
  - wz_graphics is a javascript API for generating dynamic graphics in a web page. This program in included in the SIGA archive. For the latest version and documentation, refer to wz_graphics homepage at http://www.walterzorn.com/
  - sqlDump0617.tgz - Resource of SIGA. This archive is a dump of the database as of July, 27, so that one can recover the exact state on which official GikiCLEF 2009 results were computed, for test purposes
  - wiki_documents.bz2 - Resource of SIGA. This archive is a SQL table with the full list of files available in the Wikipedia collections for the GikiCLEF 2009 evaluation, for the same purpose as the previous item


Refer to the README.txt document in this archive SIGA-1.1.tgz for installation. After installation, administration can be done over the web interface after creating the track. For more recent SIGA versions, please check the SIGA webpage http://www.linguateca.pt/GikiCLEF/SIGA 

4.2 Licenses

SIGA is available under GNU GENERAL PUBLIC LICENSE 
WikiTool is available under GNU GENERAL PUBLIC LICENSE
wz_graphics is available under the GNU Lesser General Public License (LGPL)
  


6.  Acknowledgments
------------------------------
6.1 Organizers 

GikiCLEF was co-organized by (in alphabetical order of last name) Sören Auer, Gosse Bouma, Luís Miguel Cabral, Nuno Cardoso, Iustin Dornescu, Corina Forascu (topic group), Pamela Forner (topic group), Danilo Giampiccolo (topic group), Fredric Gey (topic group), Sven Hartrumpf, Ray Larson, Katrin Lamm (topic group), Johannes Leveling, Thomas Mandl (topic group), Constantin Orasan, Petya Osenova (topic group), Anselmo Peñas (topic group), Erik Tjong Kim Sang (topic group), Diana Santos (topic group), Julia Schulz (topic group), Yvonne Skalban (topic group), Alvaro Rodrigo Yuste (topic group). 

We thank Paula Carvalho and Christian-Emil Ore for help with topic suggestion and preparation, and Anabela Barreiro, Luís Costa, Ana Engh, Laska Laskova, Leda Casanova, Cristina Mota, Rosário Silva and Kiril Simov for their participation in answer assessment. 

6.2 Funding

The organization work, as well as the writing of this paper, were accomplished under the scope of the Linguateca project, jointly funded by the Portuguese Government, the European Union (FEDER and FSE), under contract ref. POSC/339/1.3/C/NAC, UMIC and FCCN. 

We also gratefully acknowledge support of the TrebleCLEF Coordination Action. ICT-1-4-1 Digital libraries and technology-enhanced learning (Grant agreement: 215231). 

Package authors: Luís Miguel Cabral and Diana Santos.
Last change: 27 November 2009.