TCL's Computational Lexicon


A Basic Lexical Knowledge Base for Natural Language Processing Research


  • Design both terminology and ontology for structuring the lexicon
  • Provide three levels of information: morphological, syntactic, and semantic information
  • Systematically discriminate word senses using a set of semantic and logical constraints
  • Statistical corpus-based tools for assisting lexicographers
The TCL's computational lexicon is a lexical knowledge base that aims to serve as a fundamental linguistic resource for natural language processing (NLP) research. We design both terminology and ontology for structuring the lexicon based on the idea of computability and reusability.

Richness of Information
The TCL's computational lexicon consists of more than 60,000 Thai-English lexical entries, which are formally represented in three levels of information: morphological, syntactic, and semantic information. The morphological information indicates types of word composition. The syntactic information gives grammatical categories and subcategories, and verb patterns in sentence structures. The semantic information provides a set of logical and semantic constraints for discriminating word senses. The logical constraints are capable of dealing with the absence of relatedness of word meanings, whereas the semantic constraints try to discover preferences of syntactic arguments of thematic roles.

Web-Based Editor and Tools
For acquiring the semantic information in the lexicon, semi-automatic and automatic methods is necessary to meet the requirements of practical lexicography. In addition to the collaborative Web-based editor, we provide lexicographers with statistical corpus-based tools for inserting, updating, and refining lexical entries.

Extending to Multilingual Lexicon
We plan to link the TCL's computational lexicon with other lexicons in our neighboring countries to extend it from the bilingual to the multilingual lexicon. Research communities and real-world applications can benefit from this sharable resource.



Our current logical and semantic constraints are listed in the following table.

Logical Constraints
Is-a (ISA) a conceptual class of a given word
Equal (EQU) a word that has the same or similar meaning of a given word
Not-equal (NEQ) a word that has the opposite meaning of a given word
Part-of (POF) a word that specifies a part of a given word
Whole-of (WOF) a word that refers to the whole of which a given word is a part
Semantic Constraints
Agent (AGT) an entity that initiates the action
Object (OBJ) an entity that is affected by the action
Instrument (INS) an entity that is used in the action
Location (LOC) a position or place where an event occurs
Time (TIM) a point or period of time when an event occurs