Evaluation of NLP systems: Feedback on the tutorial

Diana Santos, Linguateca

Dona Scott called my attention to the lack of precision with which "word with error" was endowed in the presentation of the spellinck checker evaluation: if two words had been wrongly merged into one (as in "intoone"), how many erros, how many words?

Pedro Moura called my attention to the distinction, only implicit in the tutorial, between evaluating a new application (to see it was worth while pursuing) and evaluating a particular application for which there was already consensus that it would be useful. (The example involving Resnik's multilingual gisting vs. e.g. MT in general).

António Colaço pointed the lack of clarity in the foil about Hindle & Rooth's work, namely instead of "followed by preposition" one should have written "NPs followed by preposition".

Alexsandro Soares pointed out that in the neural network community three kinds of corpora are standardly used: as well as pointed me to ftp://ftp.sas.com/pub/neural/FAQ.html#A_data, from where I reproduced the following excerpt:

There is no book in the NN literature more authoritative than Ripley (1996), from which the following definitions are taken (p.354):

Training set:: A set of examples used for learning, that is to fit the parameters [i.e., weights] of the classifier.
Validation set:: A set of examples used to tune the parameters [i.e., architecture, not weights] of a classifier, for example to choose the number of hidden units in a neural network.
Test set:: A set of examples used only to assess the performance [generalization] of a fully-specified classifier.

He also mentioned that it might be interesting to compare evaluation contests within NLP with programming language contests like the ones in http://cristal.inria.fr/ICFP2001/prog-contest/, held in connection with the ICFP 2001, International Conference on Functional Programming.

Also, Alexsandro called my attention to the EAGLES Final Report at http://issco-www.unige.ch/ewg95/ewg95.html, of which I was unfortunately unaware during the tutorial's preparation.

I also came to read and consider highly relevant the discussion on cost-benefit analysis and technology assessment in general in Frederick Ferré's book on the philosophy of technology. (See the References page.)

Last modified on November 10, 2003 by Diana Santos, Diana.Santos@sintef.no