liminfo

XLIFF Reference

Free reference guide: XLIFF Reference

25 results

About XLIFF Reference

The XLIFF Reference is a searchable quick-reference covering the major XML-based localization interchange formats: XLIFF, TMX, TBX, and related standards. The reference is organized into four categories: XLIFF (XML Localization Interchange File Format covering the xliff root element, file containers with source/target language attributes, trans-unit segments with approval states, source/target elements with translation state tracking, group element for organizing translation units, note elements for translator/developer comments, inline formatting tags like g/x/bx/ex/ph, alt-trans for TM matches, XLIFF 2.0 differences with the segment/unit model, and the roundtrip extraction-translation-merge workflow), TMX (Translation Memory eXchange covering the tmx/header/body structure, tu translation units with tuid and changedate, tuv per-language variants with creation metadata, and prop custom properties for domain/client tagging), TBX (TermBase eXchange covering the ISO 30042 standard structure, conceptEntry for term entries, langSec/termSec for multilingual terms, and termNote metadata for part of speech and administrative status), and Other Standards (SRX segmentation rules, ITS internationalization tag set, XLIFF editing/validation tools, Okapi Framework for format conversion, and encoding considerations for UTF-8 compliance).

These formats are the backbone of the translation and localization industry. XLIFF serves as the interchange format between content management systems and CAT (Computer-Assisted Translation) tools like SDL Trados, memoQ, and OmegaT, allowing translators to work on source text while preserving the original document formatting. TMX enables translation memory exchange between different CAT tools, so previously translated segments can be reused across projects and vendors. TBX standardizes terminology databases to ensure consistent term usage across an organization. Understanding these formats is essential for localization engineers, translation project managers, and developers building internationalized applications.

The reference includes practical XML examples for every element, showing the actual markup structure that localization engineers encounter in production files. The XLIFF section demonstrates how trans-unit IDs track individual segments, how state attributes (new, needs-translation, translated, needs-review-translation, final, signed-off) manage translation workflow, and how inline elements preserve formatting tags from the original document. The TMX section shows how translation units store bilingual segment pairs with metadata, and the TBX section illustrates concept-oriented terminology entries. Coverage of the Okapi Framework explains the file-to-XLIFF roundtrip process for 50+ file formats. All content is browsable with instant search and dark mode support.

Key Features

  • XLIFF format reference: xliff root element, file containers, trans-unit segments, source/target with state tracking, group/note elements, inline tags (g/x/bx/ex/ph)
  • XLIFF workflow features: alt-trans for TM match storage, XLIFF 2.0 segment/unit model differences, and the complete roundtrip (extract-translate-merge) process
  • TMX format reference: tmx/header/body structure, tu translation units with tuid/changedate, tuv per-language variants, and prop custom metadata properties
  • TBX format reference: ISO 30042 structure, conceptEntry/langSec/termSec hierarchy, termNote for part of speech and administrative status (preferred/admitted/deprecated)
  • Related standards: SRX segmentation rules for sentence splitting, ITS internationalization tag set for translate/locNote/term markup
  • Tool coverage: Okapi Framework (Rainbow GUI, Tikal CLI) for file format conversion to XLIFF, plus XLIFF editors and validators
  • Encoding guidance: UTF-8 requirements for TMX/XLIFF/TBX, BOM handling, XML entity escaping, and CDATA usage for special characters
  • Instant search across all localization format elements with category filtering and no server processing

Frequently Asked Questions

What localization formats does this reference cover?

The reference covers four areas: XLIFF (XML Localization Interchange File Format) for translation data exchange between CMS and CAT tools, TMX (Translation Memory eXchange) for sharing translation memories between tools, TBX (TermBase eXchange, ISO 30042) for terminology database interchange, and related standards including SRX for segmentation rules, ITS for internationalization metadata, and the Okapi Framework for file format conversion. Each format is documented with its XML structure, elements, attributes, and practical markup examples.

What is the difference between XLIFF 1.2 and XLIFF 2.0?

XLIFF 2.0 introduced significant structural changes that are not backward-compatible with 1.2. The trans-unit element was replaced by a unit/segment hierarchy, inline elements were simplified, the schema became stricter, and features were moved into optional modules (like translation candidates). The file/group/unit/segment hierarchy in 2.0 provides more granular control over translation segments. However, XLIFF 1.2 remains more widely supported by CAT tools in practice, so conversion between versions is sometimes necessary.

How do trans-unit states track translation progress?

The target element's state attribute tracks where a segment is in the translation workflow. Values progress through: new (untranslated), needs-translation (marked for translation), translated (initial translation complete), needs-review-translation (sent for review), final (approved), and signed-off (formally accepted). The trans-unit also has an approved attribute for binary approval tracking. These states enable workflow automation in translation management systems, where segments are routed to translators, reviewers, and approvers based on their current state.

How does TMX differ from XLIFF?

XLIFF is designed for active translation projects — it contains source text that needs to be translated, with workflow states, notes, and inline formatting. TMX is designed for translation memory storage — it contains already-completed translation pairs (source + target segments) for reuse in future projects. When a new XLIFF file is sent to a CAT tool, the tool searches its TMX translation memories for previously translated similar segments. TMX files accumulate over time as a persistent asset, while XLIFF files are project-specific deliverables.

What are inline elements in XLIFF and why are they important?

Inline elements represent formatting from the original document that must be preserved during translation. The g element wraps paired tags (like bold or hyperlinks), x represents standalone tags (like line breaks), bx/ex mark begin/end of a tag pair, and ph is a placeholder for non-translatable content. Translators must maintain these elements in their translation — they can reorder them to match target language grammar, but they cannot delete or modify them. Broken inline elements will corrupt the merged output document.

What is TBX and how does it structure terminology?

TBX (TermBase eXchange, ISO 30042) is a concept-oriented format for terminology databases. Each conceptEntry contains one concept with langSec elements for each language. Within each langSec, termSec elements hold individual terms with termNote metadata: termType (fullForm, abbreviation, acronym), partOfSpeech (noun, verb, adjective), and administrativeStatus (preferredTerm, admittedTerm, deprecatedTerm). This structure ensures consistent terminology usage across translations — for example, marking "database" as the preferred English term and "DB" as an admitted abbreviation.

What is the Okapi Framework and how does it relate to XLIFF?

The Okapi Framework is an open-source (LGPL) localization pipeline that converts 50+ file formats (HTML, XML, DOCX, JSON, PO, etc.) into XLIFF for translation and then merges the translated XLIFF back into the original format. Rainbow is the GUI tool and Tikal is the CLI tool. The extraction step preserves the original document structure while exposing translatable text as XLIFF trans-units. After translation in a CAT tool, the merge step reassembles the document with translated content while maintaining the original layout and formatting.

Is any data sent to a server when using this reference?

No. The entire localization format reference is embedded in the page and rendered client-side. Searching, filtering by category (XLIFF, TMX, TBX, Other Standards), and browsing entries all happen within your browser using JavaScript. No XLIFF markup, translation data, or search queries are transmitted to any server.