Automatic Corpus-based Acquisition of Binary Terms

ACABIT is a terminology extraction program which takes as input a linguistic annotated corpus and proposes as output a list of multi-word term (MWT) candidates ranked from the most representative of the corpus to the least using loglike score. For each MWT candidate, a XML structure is provided which gathers all the base structures and the variations encountered.

ACABIT uses the following programs :

  • For French :
  • Brill's POS tagger for French ATILF
  • French lemmatizater FLEMM (WARNING : the output data of FLEMM has been modified. You need to use FLEMM-v2.0 (1999))
  • XML format. This perl script could be used
  • For English :
  • Brill's POS BRILL
  • Lemmatiser : lexical database CELEX



  • Japanese ACABIT by Koichi Takeuchi, University of Okayama, Japan  JACABIT
  • ACABIT for Malagazy, please contact me


To understand ACABIT, please read some of my publications, for example :

[Daille, B. 2003b]. B. DAILLE, "Conceptual structuring through term variations". In F. Bond, A. Korhonen, D. MacCarthy and A. Villacicencio (eds.), Proceedings ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, 9-16, 2003. Version PDF.

