Boas II is an environment that supports the configuration of proper name recognition capabilities for any language,. It is particularly useful for so-called low-density languages, for which few or no computational resources are available. The system requires no external resources or processors and only minimal development time by a speaker of the given language. It takes a pattern-matching approach to proper name recognition. Two sample elicitation screens are shown in the gallery.
Boas II follows the original Boas system in using an expectation-driven knowledge elicitation methodology. The system guides users through the process of providing language-specific information about proper names that is then used to automatically configure a proper name recognizer. Boas II places primary emphasis on names of people, but also covers names of companies, institutions, buildings, locations, geographical names and events (e.g., World War II).
An important aspect of the work is compiling inventories of named entity components (e.g., personal names, family names) by means of iterative corpus-based methods. These inventories both support higher-level corpus work and improve the overall functioning of the named entity recognizer. When a relatively small amount of information has been provided about a language, the system automatically configures a proper name recognition system, which can then be improved through iterative corpus-oriented trial and error. The important point for NLP developers involved with low- and middle-density languages is that the same system can be used for any language, and no external resources (apart from a corpus) or external processors are required.
Like Boas, Boas II was developed as a prototype system and is not distributable, though we seek the opportunity to expand it and make it so.