Glossary Strategy for Long Documents
The single largest source of quality loss in long-form AI translation is terminology drift: the same term rendered three different ways across one document. A glossary is the structural answer.
Three stages
- Extract: identify domain terms, named entities, and recurring phrases from the source before translation begins.
- Decide: choose a single target rendering for each term, aligned with the framework chosen from register analysis.
- Enforce: pass the glossary into every chunk's instruction so consistency holds across hundreds of pages.
Why a glossary alone is not enough
A glossary without a chosen framework produces consistent but tonally wrong output. A framework without a glossary produces tonally right but terminologically drifting output. The two stages compose, in that order.
Further reading: how the workflow uses glossaries, root causes of consistency loss.