We have recently begun creating libraries of gold- and bronze-standard TMRs, which are text meaning representations that are automatically generated by the OntoSem analyzer then manually checked and, if needed, corrected by people.
Whereas gold standard TMRs include all aspects of semantic and pragmatic analysis (including reference resolution, the interpretation of indirect speech acts, etc.) bronze standard TMRs are limited to lexical disambiguation and the establishment of the semantic dependency structure.
Gold- and bronze-standard TMRs can be used as input to knowledge-based reasoning engines and statistical processing, and they can be used for the study of different language phenomena.
There are many issues involved in creating gold- and bronze-standard TMRs, apart from the actual work involved in carrying out the process. For example:
- What is the best method for selecting texts? Clearly, it would not be wise to start with the longest, most difficult texts available since the automated portion of analysis would be prone to more errors than if the texts were simpler.
- Considering that we do not yet have microtheories to cover every single language phenomenon (e.g., irony), what should we consider to be the gold standard?
- How much lexicon and ontology development do we want to carry out during the creation of gold- and bronze-standard TMRs and who will do it: those working on the TMRs or others?
- If a new sense of a word is needed to cover some input, should all senses of that word be acquired at once or should only the needed sense be acquired, postponing the rest for later? (The ordering of acquisition is a relentless problem.)
- To what extent to do we want to pursue knowledge and/or processor debugging during the process of creating golden TMRS?
- And the list goes on...
From past experience we know that finding a "perfect" answer to any of these questions is impossible, so we are making decisions that seem reasonable and that will permit us to move along work both on this local task and on system building overall.