top of page

How LLMs could be used

​

An alternative to training Neural Machine Translation (NMT) models from scratch for low-resource language pairs would be to leverage off Large Language Models (LLMs). These models have seen a recent surge in success in various generative downstream NLP tasks such as machine translation (MT) and leveraging off this technology could be seen as a natural progression to the field of low-resource machine translation. By taking advantage of open, pre-trained, multilingual LLMs, you could avoid having to train NMT models from scratch, while at the same time, potentially making good use of the already learned general language knowledge the LLM learned during pre-training. There is potential in fine-tuning such a model for the Inuktitut-English translation task as has been done before (Publication: Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages) for other language pairs, but doing so, namely, introducing a new language previously unseen during pre-training, poses a risk. Nonetheless, despite the fact that leveraging off LLMs for low-resource language machine translation is a relatively unexplored field, it is worthy of extended research.

LLM-Picture-Dataset.png

Picture of the Inuktitut language from the Nunavut Hansard Inuktitut-English Parallel Corpus 3.0. See publication:
https://aclanthology.org/2020.lrec-1.312/

bottom of page