Conclusions

Some human or systems tasks, for instance, the ones related to discourse interpretation, may not present 100% correct answers.

However, evaluation of systems developed to perform such tasks are usually based on recall and precision figures.

In order to perform such evaluations, standard annotations are derived from the annotations provided by human coders.

Some human or systems tasks, for instance, the ones related to discourse interpretation, may not present 100% correct answers.