MLRS Word Sketches

Maltese Automatic Collocations Dictionary

Lexical Computing Limited, October 2012

This is an Automatic Collocations Dictionary produced by Lexical Computing Limited, for delivery to the EU CESAR project.
The method is
• Take a corpus of the language in question
• Lemmatise and part-of-speech-tag it
• Load it into the Sketch Engine
• Apply a ‘sketch grammar’ (of regular expressions over part-of-speech tags). A sketch grammar, when applied to a corpus, identifies a set of collocations, eg <headword, grammatical relation, collocate> triples.
• For all lexical words of sufficient frequency, list all collocations they participate in

In this case,
• The Maltese MLRS Corpus was developed at the University of Malta by Claudia Borg, Albert Gatt, et al. It is contains about 111 milion words and it was processed by their lemmatizator and tagger for Maltese.
• The sketch grammar was also as prepared by Jan Joachimsen
• The dictionary has entries for 12,553 headwords, with an average of 5.7 collocations per headword.
• The entry for each collocation includes pointers to its corpus examples on the Sketch Engine website.

