The TCL's computational lexicon is a lexical knowledge base that
aims to serve as a fundamental linguistic resource for natural
language processing (NLP) research. We design both terminology and
ontology for structuring the lexicon based on the idea of
computability and reusability.
Richness of Information
The TCL's computational lexicon consists
of more than 60,000 Thai-English lexical entries, which are
formally represented in three levels of information:
morphological, syntactic, and semantic information. The
morphological information indicates types of word composition. The
syntactic information gives grammatical categories and
subcategories, and verb patterns in sentence structures. The
semantic information provides a set of logical and semantic
constraints for discriminating word senses. The logical
constraints are capable of dealing with the absence of relatedness
of word meanings, whereas the semantic constraints try to discover
preferences of syntactic arguments of thematic roles.
Web-Based Editor and Tools
For acquiring the semantic information in the lexicon,
semi-automatic and automatic methods is necessary to meet the
requirements of practical lexicography. In addition to the
collaborative Web-based editor, we provide lexicographers with
statistical corpus-based tools for inserting, updating, and
refining lexical entries.
Extending to Multilingual Lexicon
We plan to link the TCL's computational lexicon with other
lexicons in our neighboring countries to extend it from the
bilingual to the multilingual lexicon. Research communities and
real-world applications can benefit from this sharable resource.
|