Making the CHAVE collection available
Linguateca
CHAVE em português
CHAVE is a Portuguese collection for IR and Q&A created for CLEF in 2004 and updated every year (see the CLEF website, as well as a paper [Santos & Rocha 2004] describing its creation in some detail).
In addition to a large set of documents, namely the 1994 and 1995 full editions of the Público newspaper, we make available the following resources, related to two diferent tracks:
- Information Retrieval (IR) Ad Hoc Track
- a list of topics in Portuguese, compiled cooperatively with the other CLEF organizers
- a pool of (binarily) judged documents for each topic
- Question Answering Evaluation QA@CLEF
- a list of questions and answers in Portuguese, compiled cooperatively with the other QA@CLEF organizers (can be directly download from here)
- a (non exhaustive) set of document ids that support the answer(s) for a subset of the above
This resource is organized as follows:
- Textos - Folder containing the
complete texts of the newspapers PÚBLICO and
Folha de São Paulo of 1994 and 1995.
- 2004 - The Portuguese resources concerning CLEF2004
- 2004 - The Portuguese resources concerning CLEF2005
- 200x/Monte - Document pools for each topic relative to CLEF200x
- 200x/PerguntasRespostas - Questions and answers compiled by the organization of CLEF200x
- 200x/Topicos - Topics in Portuguese, compiled by the organization of CLEF200x
In order to comply with CLEF tradition, we request that users of CHAVE obey the following conditions:
- Register in order to get the collection
- Reference the following facts: that the collection consists of the 1994 and 1995 complete editions of Público newspaper (www.publico.pt) and Folha de São Paulo (www.folha.com.br), that it was compiled by Linguateca (www.linguateca.pt), and that this compilation occurred in the framework of CLEF (www.clef-campaign.org)
- Use it for research and development only; not for reselling or making profit from its direct distribution (on-line or off-line)
- No results obtained outside the CLEF official campaigns can invoke CLEF's name in a way that implies that the system was assessed within CLEF, i.e., it is not acceptable to compare with results out of contest without clearly stating it. Ideally, one should simply refer to the CHAVE collection.
We make everyone also aware that CHAVE is a part of a much larger (multilingual) collection (to be) distributed by ELRA, that we strongly encourage anyone interested in CLIR to get.
Last update: 14 December 2005.
Send questions, comments and suggestions