Ancient Sumerian NMT
This is a research project that addresses the challenge of translating a low-resource, linguistic isolate like Ancient Sumerian into modern Turkish.
Technical Deep Dive
- Transformer Architecture: Fine-tuned a T5 model using transfer learning to overcome sparse data.
- Custom Tokenization: Developed a tokenizer specifically tailored to Sumerian transliteration.
- Complex Data Pipeline: Engineered a multi-language pipeline (Python, C, Perl) to harmonize data from diverse academic sources (ORACC, CDLI, ETCSL).
- Evaluation: Rigorous performance testing using BLEU, METEOR, and WER metrics.
This work combines digital humanities with (used to be) state-of-the-art AI to preserve and understand ancient knowledge.