GikiCLEF 2009 was an evaluation task under the scope of CLEF. Its aim was to evaluate systems which find Wikipedia entries / documents that answer a particular information need, which requires geographical reasoning of some sort. GikiCLEF was the successor of the GikiP 2008 pilot task which ran in 2008 under GeoCLEF.
Organization of GikiCLEF was lead by Linguateca with the help of a large multilingual organization committee.
GIRA, the full GikiCLEF resource package, is available from here, as well as in a zip file, zip, containing
- the pool
- the topics and their documentation
- the source code of all programs
Contents |
Important dates
- September 2010 - Book on CLEF 2009, including papers on GikiCLEF, available
Already past (in reverse order):
- 29 November 2009 - Final paper due for CLEF Springer book
- 30 September 2009 - 2 October - CLEF workshop, see GikiCLEF at the CLEF workshop at Corfu.
- 23 August 2009 - Deadline for Working notes paper and extended abstract
- 27 July 2009 - final GikiCLEF results made available
- 24 June 2009 - topics, including clarification, made publicly available to everyone.
- 17 June 2009 - First results available for inspection of participants
- 4 June 2009 - Assessment started
- 31 May 2009 - Deadline for run submission.
- 15 May 2009 - Topic release, opened run submission.
- March 2009 - Discussion on the final definition of the GikiCLEF task, with corresponding publication of final participation guidelines.
- 26 January 2009 - Final deadline for registration
- 20 January 2009 - Final GikiCLEF collections (1.0) released.
- 28 November 2008 - Raw GikiCLEF collections released.
- 30 October 2008 - Presentation of GikiCLEF at the GIR workshop held at CIKM 2008, Napa Valley, CA, EUA
- 9 October 2008 - GikiCLEF mailing list open, call for participation and for discussion on the task
Main author/editor of the Website: Diana Santos
Task description
For GikiCLEF, systems will need to answer or address geographically challenging topics, on the Wikipedia collections, returning Wikipedia document titles as list of answers.
The user model for which GikiCLEF systems intend to cater for is anyone who is interested in knowing something that might be already included in Wikipedia, but has not enough time or imagination to browse it manually.
So, in practice, a system participating in GikiCLEF receives a set of topics -- representing valid and realistic user needs preferably from non-English users -- in all GikiCLEF languages and will have to produce a list of answers, in all languages it can find answers.
The motivation for this kind of system behaviour is that in a real environment, a post-processing module For different kinds of human users, and depending on the languages those users could read, different possible output formats would filter the information per language, or would rank it in order of preference. We are assuming that people prefers to read answers in their native languages, but that many people are happy with answers (answers are titles of Wikipedia entries) in other languages they also know or even just slightly understand.
This is not relevant for the GikiCLEF evaluation for now, since it congregates a lot of people with different favourite languages. This is why systems may choose a subset of languages when registering.
GikiCLEF languages
We have organizers and/or participants interested in the following (Wikipedia) languages:
Bulgarian, Dutch, English, German, Italian, Norwegian (both Bokmål and Nynorsk), Portuguese, Romanian and Spanish.
Given that Wikipedia is a multilingual collection and the GikiCLEF collection has 10 versions, what does registering for a set of particular languages mean in GikiCLEF? It means simply that participants are only targeting users of the languages they register for, i.e., that their systems will try to answer in those languages.
Anyway, the final overall score is given as a composite of success in all languages, but we will also produce scores per language.
GikiCLEF collections
The Wikipedia collections for all GikiCLEF languages are available to download (latest version is v1.0, released on 20th January, 2009).
We make available the data in three formats:
- a HTML dump, released by Wikipedia at the static.wikipedia.org
- a SQL dump, released by Wikipedia at download.wikipedia.org (just for completeness)
- and an XML version, which is created by the WikiXML tool by GikiCLEF organization from the above-mentioned SQL dump
Unfortunately, we cannot guarantee that the HTML and the SQL dumps are based on the very same content, but we managed to get most of the collections from June 2008.
Participant systems can use either the XML or the HTML version of the collections to provide the answers. GikiCLEF specifies only the submission format, whose answers have to pointo to valid HTML or XML files in the GikiCLEF collection. Other than that, systems can use whatever they want.
Here is a table with the generation dates reported by Wikipedia:
Language | HTML dump date | SQL dump date |
---|---|---|
bg | 7th June, 2008 | 13th June, 2008 |
de | 1st July, 2008 | 7th June, 2008 |
en | 19th to 21st June, 2008 | 24th May and 14th July, 2008 (*) |
es | 17th June, 2008 | 28th June, 2008 |
it | 20th June, 2008 | 26th June, 2008 |
nl | 26th June, 2008 | 9th June, 2008 |
nn | 22nd June, 2008 | 20th June, 2008 |
no | 22nd June, 2008 | 10th June, 2008 |
pt | 28th June, 2008 | 25th June, 2008 |
ro | 25th June, 2008 | 10th June, 2008 |
(*) Note: The category.sql.gz file is from 14th July, because the respective file for the English dump of the 24th of May is missing from the Wikipedia servers. There are no dumps generated in June 2008, so we chose the 24th May dump.
These are the number of documents included in the XML collection: total number of documents, number of pages (namespace 0), number of templates (namespace 10), number of categories (namespace 14) and images (namespace 6).
Language | Total | Pages | Templates | Categories | Images |
---|---|---|---|---|---|
en | 6587912 | 5255077 | 154788 | 365210 | 812837 |
de | 1553181 | 1324321 | 17967 | 53610 | 157283 |
pt | 896698 | 830759 | 16647 | 48761 | 531 |
it | 851242 | 676166 | 63327 | 51984 | 59765 |
es | 714294 | 641852 | 11885 | 60556 | 1 |
nl | 711116 | 644178 | 23090 | 37544 | 6304 |
no | 303796 | 267893 | 8932 | 26023 | 948 |
ro | 214157 | 148691 | 9168 | 29023 | 27275 |
bg | 121320 | 94452 | 5755 | 8464 | 12649 |
nn | 78533 | 63505 | 2449 | 12158 | 421 |
Download the GikiCLEF collections
For details on problems found with the GikiCLEF XML collection, see XML troubles.
Topics
Fifty (50) topics were prepared for GikiCLEF 2009. The topics were made available to the participants on 15 May 2009, in all GikiCLEF languages. From 24 June 2009, further clarification and use case, which were prepared by the topic managers and used by the assessors during the assessment process, are also available from here:
Questions, further clarification and use cases, in English
The list of topics, as the participants received them, is here in each language:
- Bulgarian
- Dutch
- English
- German
- Italian
- Norwegian (bokmaal)
- Norwegian (nynorsk)
- Portuguese
- Romanian
- Spanish
Example topics
We have prepared 24 example topics for GikiCLEF:
- in xml: Bulgarian, Dutch, English, German, Italian, Norwegian (bokmaal), Norwegian (nynorsk), Portuguese,Romanian and Spanish.
- in text (UTF-8): Bulgarian, Dutch, English, German, Italian, Norwegian (bokmaal), Norwegian (nynorsk), Portuguese, Romanian and Spanish.
You are also welcome to see the examples from GikiP 2008 (English, German and Portuguese), although the format was different.
Parsed versions of the example topics
Guidelines for topic creation
The topic choice committee strove to devise topics with crosslingual and cultural interest, so that the need for looking in Wikipedia in different languages is real and not artificial.
GikiCLEF topics should conform to the following criteria:
- realistic topics which can be answered in some Wikipedia covered by GikiCLEF
- most topics will be chosen with a cultural bias so that not any Wikipedia should have that information
- topics may require knowledge of culture to understand the way they should be answered (or better, what it is that is being sought) (this may mean that translation into other languages may require lengthy explanations)
- answers have to be justified in at least one Wikipedia (that is, the string may be found as a entry in all Wikipedias, but the rest of the information has to be found in at least one)
- questions may include ambiguous concepts or names. In that case, participant systems have to accept that only answers related to the proper disambiguation will be considered correct e.g. Which countries did Bush visit in the first two years of his mandate? will not be correctly answered by Kate Bush's travels
- in case there appear ambiguities in the topic formulation that have not been discussed or clarified in the narrative, and which have more than one interpretation acceptable (e.g. has a user model), the assessment will accept both. E.g. in Awarded-winning Romanian actresses in international cinema festivals, one would have to accept those actresses actually receiving prizes, or just in the audience or even hosting the event.
- different answers about the same subject are welcome, though, as in Who is (considered to be) the founder of mathematics? or Name the greatest scientific breakthroughs in the XIXth century
The GikiCLEF topic management system
Submission format
Submissions should be encoded in UTF-8, and all non-blank lines will refer to answers to the topics -- blank lines are accepted and will be ignored.
Each answer is divided into field, by a group of spacing characters (\s+):
- First field: topic id
- Second field: Wikipedia page identifier that represents the answer
- Third field: a list (possibly empty) of Wikipedia page identifiers to justify the answer. The justifications are enclosed in brackets {} and separated by spaces between them. In case of no justifications, the system must still output empty brackets ({}), in order to preserve token count and allow more tokens.
- Fourth and other fields: Not assigned (yet).
Example submissions
We have created a disparate set of example submissions to better illustrate the submission format:
- example 0 corresponding to a fictive topic GC-2009-99 What countries joined the EU in 1986?
- example 1 corresponding to imaginary topics
- example 2 corresponding to the example topics
- example 3 corresponding to the example topics
- example 4 corresponding to the example topics
Examples 0 and 1 give answers grounded solely on documents from the collection in XML format. Answers grounded on documents from the static HTML collection (also available on the GikiCLEF collection directory) are also accepted, and thus they should have the '.html' extension. Participants may choose between the XML collection or the static HTML dump to base their answers, but no additional points will be awarded by returning the same answer with both .xml and .html versions.
In the example 0 above, the three answers 'Portugal' would be considered correct if at least in one of the languages there was a justification for the validity of the statement (and in no other GikiCLEF language there was an explicit denial of it.)
Further comments on example 1:
- order of answers is not relevant
- after the third field anything (so far) can happen
- in principle, justifications are in the same (language) Wikipedia of the answer (recall that it is enough to have justification in one language to have it correct, but this is done/computed afterwards per run).
- the same answer with different justifications can be sent in as a different answer
- one can provide the answer as justification (recursive): although this does not make sense for plain GikiCLEF, it may for more complex presentation issues, to be specified in the fourth or later fields
Examples 2 to 4 are more realistic example submissions, which are related to the example topics, and were conceived also to test the submission and assessment systems, first, and the evaluation system, later.
Some errors were included in both examples 2 and 4.
Participation guidelines
See the GikiCLEF participation guidelines in 2009 for details.
Advanced issues
As a result of the reflection after GikiP, the following suggestions for future improvement of the systems were made:
- Improve presentation of the results: To devise user-friendly systems, an unordered list of answers is often not enough, especially when multiple answers can be related. So, one might reward ordered lists (for instance by granularity given a particular ontology, or by time if the question concerns a particular temporal journey).
- Investigate geographical diversity: Another issue that is now receiving some attention is taking geographical diversity into account: depending on the kind of topic, one might want to boost diversity instead of mere quantity. In fact, for some users and uses, returning too (geographically) close hits may be considered annoying instead of relevant.
We request feedback from participants on these two subjects.
Since, as far as we know, they are fairly new and therefore lack established measures or practices, we would be willing to foster discussion around them and try out really novel ideas if there are interested participants.
Evaluation and assessment
In this section we provide information on the two processes: assessment by the judges and evaluation of the system given this assessment. We then present the final GikiCLEF results as well.
General rules of GikiCLEF 2009
In GikiCLEF, only answers / documents of the correct type are considered correct. That is, if the question is about people, answering the name of an organization will be considered wrong, even if in the document whose title was an organization the person one would want as answer is clearly justified.
After pooling all answers returned by the participant systems, they will be manually assessed by the organization.
For an answer to be be considered correct, it has to mention (either in the page, or in the justification chain) enough information for a person to be able to judge (NB! We are not requesting a justification field, it is OK that the justification is found in the page itself).
If this happens in any of the languages the system found that answer, all of them are considered correct. On the contrary, if the system was not able to justify the answer(s) in any language, the answers will all be considered wrong.
In other words, an answer has to be assessable by a human judge. Correct answers with no justification are considered useless and therefore incorrect.
On the other hand, information about the existence of justification is propagated for each answer across languages, for a particular run. (Except if there are explicit conflicts between the information in the pages in the different languages. In that case, each answer will have to be scored independently, and non-justified answers will not be considered.)
The evaluation measures are as follows for a given run, and for each language:
- C: number of correct (that is, justified in at least one language) answers
- N: total number of answers provided by the system
- GikiCLEF score per language: C*C/N (so one has a score for de, pt, etc, as C_de*C_de/N_de, C_pt*C_pt/N_pt, etc.)
The final score of any system is given by the sum of the scores for each individual language.
So, the more languages a system returns answers in, the better its score. Furthermore, a language with no answers for a particular topic (C=0) will not contribute for the relative ordering of the systems.
Note that a score for a particular language is the sum for all topics, not the average of the scores per topic. This is order not to penalize languages which have no information on a particular topic in their Wikipedia.
The assessment process
This section describes the assessment process in more detail. Note that assessment automatically receives information from the answers already inserted in the system by the topic owners, during topic development.
As mentioned in the previous section, all answers will be pooled, but the same answer with different justifications (that is, different pages in the justification field) has to be considered and assessed separately, in order to validate the justification presented by the system.
For a subset of the answers, more than one assessor is called to judge.
Assessors have to, for each answer candidate, indicate, in a first pass, one of three alternatives: a) correct; b) incorrect; c) uncertain (= don't know). (While both correct and incorrect choices presuppose definite knowledge of the assessor about the topic, the choice of this last alternative presupposes that they first tried to fairly assess the correctness by viewing the justifications offered. No requirements of consulting other external choices are made or even desirable.)
Then, they have to answer whether the page itself together with the justication chain does provide enough justification for that answer, by choosing between a) Justified or b) Not justified.
They will also be required to inscribe a comment in the (hopefully rare) cases where although the answer has been (automatically) flagged as correct, the material in the page explicitly contradicts it. Any other comments on difficult choices or doubts are also welcome.
Then the conflict assessment procedure takes place, which is done by the administrators, who contact the opposing parties if needed or directly correct the verdicts. After this process, assessors are informed of all changes to their judgements so that they can complain or redo other choices.
After all answers have been assessed and problematic cases discussed, the score for individual runs is computed by the GikiCLEF evaluation system.
Further information:
- Assessment in the GikiCLEF assessment system: We have developed a complex system to help a large number of assessors to work cooperatively in scoring participant systems, whose internals and rationale are described here
- Precise guidelines for GikiCLEF assessors: Given a particular answer to a particular question, how to proceed, and a detailde example.
- First information about conflict resolution: When we discovered that the assessment process was not that easy, after all, and some information on that
The evaluation process
The evaluation program computes the evaluation measures, taking into consideration the more complicated cases:
- correct answers but no justification in any language -- deemed incorrect
- answers which depend on the language (for cases in which different language Wikipedias have conflicting answers)
Scores are created per language and total.
In any case, and although assessment is complicated and refined, evaluation is very easy:
- a system only gets the score 1 for a particular answer if that answer is correct and justified in that particular run. Wild guesses, although correct. will not receive any score whatsoever
- all other cases will be marked as incorrect, or better scored as 0
We may be able to provide other scores later, as well as more information, but these will not be official because were not part of the task definition as participants entered the GikiCLEF 2009 evaluation contest.
GikiCLEF 2009 results
Global results (and per language results) are shown in http://www.linguateca.pt/GikiCLEF/resources/GikiCLEF_results.html
Further statistics per topic can be seen in http://www.linguateca.pt/GikiCLEF/resources/GikiCLEF_statistics.html
GikiCLEF 2009 resources
For the listing of documents correct and justified only you can download the file http://www.linguateca.pt/GikiCLEF/resources/GikiCLEF_answers_correct_justified.txt (1009)
For the listing of documents with both justified and unjustified answers but that were considered correct, you can download http://www.linguateca.pt/GikiCLEF/resources/GikiCLEF_answers_correct.txt (1621)
We are currently finishing the GIRA package with every resource compiled under GikiCLEF 2009 for easy access.
Organization committee
GikiCLEF is being co-organized by (in alphabetical order of last name):
- Sören Auer
- Gosse Bouma
- Luís Miguel Cabral
- Nuno Cardoso
- Iustin Dornescu
- Corina Forascu (topic group)
- Pamela Forner (topic group)
- Danilo Giampiccolo (topic group)
- Fredric Gey (topic group)
- Sven Hartrumpf
- Ray Larson
- Katrin Lamm (topic group)
- Johannes Leveling
- Thomas Mandl (topic group)
- Constantin Orasan
- Petya Osenova (topic group)
- Anselmo Peñas (topic group)
- Erik Tjong Kim Sang (topic group)
- Diana Santos (topic group)
- Julia Schulz (topic group)
- Yvonne Skalban (topic group)
- Alvaro Rodrigo Yuste (topic group)
We thank Paula Carvalho and Christian-Emil Ore for help with topic suggestion and preparation, and Anabela Barreiro, Luís Costa, Ana Engh, Laska Laskova, Leda Casanova, Cristina Mota, Rosário Silva and Kiril Simov for help with answer assessment.
Acknowledgements
GikiCLEF is organized under the scope of CLEF, an activity of the TrebleCLEF Coordination Action. Other related evaluation tasks: QA@CLEF, GeoCLEF.
GikiCLEF was funded by Linguateca, itself jointly funded by the Portuguese Government and the European Union (FEDER and FSE) under contract ref. POSC/339/1.3/C/NAC, currently also funded by UMIC and FCCN.
We also gratefully acknowledge support of the TrebleCLEF Coordination Action, ICT-1-4-1 Digital libraries and technology-enhanced learning (Grant agreement: 215231) for the assessment work.
Previous information
If you have already registered, log in to see your submissions or to change registration data. You can also register if you have already registered in CLEF. To reach all participants and interested observers at GikiCLEF, you can still join the GikiCLEF mailing list.
Call for participation
Registration was open until 26 January. Please note that, to participate in GikiCLEF, registration in CLEF is also mandatory (started 4 February).
In any case, if you think that at this late hour your system is still able to participate in GikiCLEF, you can contact us any time before submission closes.
GikiP 2008 pilot task
The GikiP task was accepted as a pilot task in GeoCLEF 2008, and its organization (including topic development, assessments and evaluation of the results) was made by Linguateca.
Please visit the main page of GikiP 2008 for more information on GikiP 2008.
Further material on GikiP or GikiCLEF
- [Santos et al. 2008]
- Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "Getting geographical answers from Wikipedia: the GikiP pilot at CLEF". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), Cross Language Evaluation Forum: Working Notes for the CLEF 2008 Workshop (Aarhus, Denmark, 17-19 de Setembro de 2008), s/pp. http://www.linguateca.pt/Diana/download/SantosetalWNCLEF2008.pdf
- [Santos & Cardoso 2008]
- Diana Santos & Nuno Cardoso. "GikiP: Evaluating geographical answers from Wikipedia". In 5th Workshop on Geographic Information Retrieval (GIR'08) (Napa Valley, CA, USA, 30 October 2008), pp. 59-60. http://www.linguateca.pt/Diana/download/SantosCardosoGIR08.pdf Slides
- [Santos & Cardoso 2009]
- Diana Santos & Nuno Cardoso. "REMando para o futuro: reconhecimento de entidades mencionadas e não só". Escola de Verão Belinda Maia (Edv 2009) (FLUP, Porto, Portugal, 29 de Junho - 3 de Julho 2009). Slides
- [Santos & Cabral 2009]
- Diana Santos & Luís Miguel Cabral. "GikiCLEF: Crosscultural issues in an international setting: asking non-English-centered questions to Wikipedia". In Francesca Borri, Alessandro Nardi & Carol Peters (eds.), Cross Language Evaluation Forum: Working notes for CLEF 2009 (Corfu, Grécia, 30 Setembro - 2 Outubro), Springer. Slides http://www.linguateca.pt/Diana/download/SantosCabralCLEF2009WN.pdf
- [Hartrumpf & Leveling 2009]
- Sven Hartrumpf & Johannes Leveling. "GIRSA-WP at GikiCLEF: Integration of Structured Information and Decomposition of Questions". GikiCLEF overview session at CLEF workshop (GikiCLEF) (Corfu, Greece, 30 September - 2 October). Slides
- [Dornescu 2009]
- Iustin Dornescu. "EQUAL - Encyclopaedic QA for Lists". GikiCLEF overview session at CLEF workshop (GikiCLEF) (Corfu, Greece, 30 September - 2 October). Slides
- [Larson 2009]
- Ray R. Larson. "Interactive Probabilistic Search for GikiCLEF". GikiCLEF overview session at CLEF workshop (GikiCLEF) (Corfu, Greece, 30 September - 2 October). Slides
- [Cardoso 2009]
- Nuno Cardoso. "GikiCLEF topics and Wikipedia articles: did it blend?". CLEF2009 (Corfu, Grécia, 30 Setembro - 2 Outubro). Poster
- [Santos et al. 2009]
- Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling & Yvonne Skalban. "GikiP at GeoCLEF 2008: Joining GIR and QA forces for querying Wikipedia". In Carol Peters, Tomas Deselaers, Nicola Ferro, Julio Gonzalo, Gareth J.F.Jones, Mikko Kurimo, Thomas Mandl, Anselmo Peñas & Viviane Petras (eds.), Evaluating Systems for Multilingual and Multimodal Information Access 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers 2009, Springer, pp. 894-905. http://www.linguateca.pt/Diana/download/SantosetalGikiPCLEF2008Springer2009.pdf
- [Santos et al. 2010]
- Diana Santos, Nuno Cardoso & Luís Miguel Cabral. "How geographic was GikiCLEF? A GIR-critical review". (FCUL, Lisboa, 26 de Janeiro de 2010). Slides
- [Santos et al. 2010]
- Diana Santos, Nuno Cardoso & Luís Miguel Cabral. "How geographical was GikiCLEF? A GIR-critical review". In 6th Workshop on Geographic Information Retrieval (GIR'10) (Zurique, 18-19 Fevereiro). http://www.linguateca.pt/Diana/download/SantosCardosoCabralGIR2010.pdf
- [Santos et al. 2010]
- Diana Santos, Luís Miguel Cabral, Corina Forascu, Pamela Forner, Fredric Gey, Katrin Lamm, Thomas Mandl, Petya Osenova, Anselmo Peñas, Alvaro Rodrigo, Julia Schulz, Yvonne Skalban & Erik Tjong Kim Sang. "GikiCLEF: Crosscultural issues in multilingual information access". In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner & Daniel Tapias (eds.), Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) (Valletta, Malta, 17-23 May de 2010), European Language Resources Association, pp. 2346-2353. http://www.linguateca.pt/Diana/download/SantosetalGikiCLEF.pdf
- [Santos et al. 2010]
- Diana Santos, Luís Miguel Cabral, Pamela Forner, Corina Forascu, Fredric Gey, Katrin Lamm, Thomas Mandl, Petya Osenova, Anselmo Peñas, Alvaro Rodrigo, Julia Schulz, Yvonne Skalban, Erik Tjong Kim Sang & Nuno Cardoso. "GikiCLEF: Crosscultural issues in multilingual information access". Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) (Valletta, Malta, 17-23 May de 2010). Poster
- [Cardoso 2010]
- Nuno Cardoso. "GikiCLEF topics and Wikipedia articles: Did they blend?". In Carol Peters, Giorgio Di Nunzio, Mikko Kurimo, Thomas Mandl, Djamel Mostefa, Anselmo Peñas & Giovanna Roda (eds.), Multilingual Information Access Evaluation, VOL I Setembro de 2010, Springer.
- [Santos & Cabral 2010]
- Diana Santos & Luís Miguel Cabral. "GikiCLEF : Expectations and lessons learned". In Carol Peters, Giorgio Di Nunzio, Mikko Kurimo, Thomas Mandl, Djamel Mostefa, Anselmo Peñas & Giovanna Roda (eds.), Multilingual Information Access Evaluation, VOL I Setembro de 2010, Springer, pp. 212-222. http://www.linguateca.pt/Diana/download/SantosCabralSpringer2010.pdf
- [Costa et al. 2012]
- Luís Costa, Cristina Mota, Diana Santos, Luís Costa, Cristina Mota & Diana Santos. "SIGA, a System to Manage Information Retrieval Evaluations". In Computational processing of the Portuguese language (PROPOR2012) (Coimbra, Abril de 2012), pp. 173-184. http://www.linguateca.pt/Diana/download/CostaetalPROPOR2012.pdf
- [Mota et al. 2012]
- Cristina Mota, Alberto Simões, Cláudia Freitas, Luís Costa & Diana Santos. "Págico: Evaluating Wikipedia-based information retrieval in Portuguese". In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U?ur Do?an, Bente Maegaard, Joseph Mariani, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Eigth International Conference on Language Resources and Evaluation (LREC'12) (Istambul, 23-25 de Maio de 2012), pp. 2015-2022. pdf poster pdf