Net-based services at the Text Laboratory

Diana Santos

This document describes the net-based services I set up during my previous job at the Text Laboratory, University of Oslo. Only this introductory remark has been added (2.12.99) since the original time stamp (below).



Introduction

By net-based it is meant both By services is meant working in order to make available resources, programs, etc. to a wider community than those tha can be called customers, i.e., people who come to us and want help in solving a particular problem. (This sort of service is also provided at the Text Lab, but is described elsewhere).

One of the simplest technically (but not necessarily in other dimensions) is information providing, as the present text illustrates. The idea is that most people should come first to us through the net in order to find a parrticular information, and only afterwards have direct contact, with a better knowledge of what to expect and what to ask.

There is, however, such a range of problems and questions that a Text laboratory could be expected to answer that it is not realistic to expect that most users would be content with paying us only a virtual visit.

A converse problem is that given that our institution has relatively few resources and definitely very few people to give support to the end users, one should not aim to give a too grand view of the Text Laboratory, only to foster disappointment when actual requests might have to be seriously postponed or even refused.

For an institution such as ours, it is therefore of the utmost importance that the policy, short-term goals and current work in progress are available to all potential users. This way they can give suggestions, comments and proposals, as well as take themselves adequate choices of tools and even redirect their own research accordingly. Thus the main reason of the present text is to share with the researchers and students of the Faculty of Arts of the University of Oslo our current activities and objectives, in order to allow both parties the benefit of information, knowledge, and participation.

Kinds of services

A categorization of our net-based services will be attempted here, in terms of the use of computational resources. It should be noted that the classification following does not necessarily reflect the amount of human work involved, which can vary considerably in any case.

Passive services

After Web publishing, the next most basic service we offer is to make available data and programs created by others, which users can access or use from their own machines. From our side, this involves getting licenses from the sources, install and document them, and publicize or give education on their use.

There are more texts, corpora and programs world wide that we can afford to install. Therefore it is very important that users specifically ask for the ones they want.

Semi-passive services

The outcome of what we call "semi-passive" services is the same as above, but we have created the resources (data or programs) ourselves. This is the case of the Bosnian corpus, the newspaper texts, and the training and test corpora for the tagger projects. One might also hope that some of the programs developed as support for specific users may in turn acquire this status, in that other users may use them requiring minimal changes.

In any case, here again the projects giving rise to semi-passive service must originate with the users, who have to request for a particular solution, or collect the data, or make their data available, or hopefully at least providing some input.

Active services

By active services it is meant programs that run at the Text Laboratory machines, and that users therefore invoke either by logging in at our computers (local services) or through the WWW (global services). These are what most readily come to mind as net-based services.

They include first and foremost programs which run under Unix, and which also often access resources too large to download to the users' individual machines. We have thus generally two options:

Other considerations, such as the desire to make the services available to an audience greater than our immediate users, like the case of the Bosnian project and possibly the tagger project, also require the second solution.

The Oslo Corpus of Bosnian Texts

Here we exemplify with the Bosnian project, because it actually encompasses the four main kinds of activity which fall best under the scope of the Text Laboratory.

The best way to introduce the readers to this project is let them try it. We have therefore provided a demo version for those not interested in Bosnian itself, but in the design and potentialities of the service. As opposed to the actual Bosnian corpus, this version is directed to the SUSANNE annotated corpus of English, which can be used free for research purposes.

[The Oslo Corpus of Bosnian Texts | A demo version of the interface to CQP ]


Even though there is a lot of documentation in the pages above, there are several things that are missing from the point of view of reporting what has actually been done, and why some choices were made. Here we will pursue the matter in the hope that our work may help others with similar questions.

[Home Page | Publications]


Last modified on November 19th, 1997 by Diana Santos <dianasa@ilf.uio.no>