Págico - Português Mágico
Motivation
Págico,
Linguateca
Em português
There are five reasons underlying Págico's design:
- The limitations of current NLP systems
- The little interest in Portuguese-speaking culture, including Brazilian, Angolan, Timorese, Cape Verdean etc subjects, also in the Portuguese-sepaking countries
- The English-speaking bias of the Portuguese wikipedia
- The need to join forces with teaching of lusophone culture
- The human-machine competition/emulation/connection
The limitations of current NLP systems
Current systems dealing with natural language processing (of Portuguese and not only) are still too much concerned with matters internal to their own discipline, such as treebanks, named entity recognizers, spelling checkers, etc. While it is natural to first develop the tools and the resources so that they can then be applied, this process should be done in tandem with real world problems.
Págico tries to change this situation by requiring useful functionalities in a man-machine entreprise, that of doing information access in a context of Portuguese-speaking culture (and teaching).
We note that perfection and usefulness are not always closely tied, and we mean it is time to devote more attention to useful systems even though not perfect.
Lack of interest in the Portuguese-speaking community
In Linguateca we believe it is wrong to look uncritically to foreign models -- and especially the anglo-american model -- as the authority and the quality that should be used as the measuring rod in our work, which should be evaluated by standards related to the Portuguese language and its various cultures.
This is unfortunately also consequence of the new funding models in Portugal and Brazil, which give more weight to publishing in English than in Portuguese, despite the more than 200 million speakers.
So one of the goasl of Págico is helping the community focus on the Portuguese challenges.
Wikipedia's English-speaking bias
While one should expect that the Portuguese wikipedia would center around events, people and history of Brazilian and other Portuguese-speaking countries, we fear that there is much more translation than real creation. (Particular examples can be found in the Portuguese version of this page.)
This means that the Portuguese wikipedia, in addition to be biased towards science ficton, fantasy and computer science, as Tony Veale has pointed out about the English wikipedia, has in addition a non-despisable bias for the globalizing English-speaking trend.
The need to join forces with teaching of lusophone culture
In order to counteract the deplorable situation described in the two previous sections, it appears sensible to join NLP and the teaching of language and culture. Págico does just this: by creating topics around central facts and issues in Portuguese-speaking cultures, which can also be used for teaching and learning, it may help specialists to devote work to improve wikipedia.
In addition, by asking the Portuguese-speaking community to compete on topics which deal with places, fictional or real, of our cultures may help better knowledge among cultures.
The human-machine competition/emulation/connection
Finally, it is undeniable that the competition between man and machine has media appeal, and one of the best publicity events for NLP lately has been IBM's "Watson", a computer winning the American TV-game Jeopardy -- see Martin Ringel's presentation, IBM Watson and Jeopardy!, at the NoTur 2011 conference.
So, by allowing people (alon or in teams) to also try their luck in Págico we are not only increasing the participation but we will be able to look into the differences between automatic and human approaches and results, in the search of the optimal collaboration scheme between people and computers.
Last update: 24 June 2011.
Contact Págico's team