Filter System¶

Summary¶

OmegaT's filter system enables reading, extracting translatable content from, and writing back a wide variety of file formats — from .po and .properties to .html, .xml, .xlsx, .docx, OpenOffice, and many more. Each filter is a two-fold component that can both read and write its format.

How Filters Work¶

A filter class can: 1. Read — Parse a document in a given format 2. Extract — Pull out translatable content as segments 3. Write — Rebuild the document, replacing translatable content with translations

Key invariant: Filters must be two-fold (read & write the same format).

FilterMaster¶

FilterMaster is the central organizer: - Maintains the registry of all available filters - Detects which filter to use for a given file - Routes files to appropriate filters based on extension and content

File Detection¶

OmegaT distinguishes files by: - File extension — e.g., *.txt, *.po, *.html - File content — Some formats require content inspection - Filter instantiation — A single filter class can be instantiated multiple times with different parameters (e.g., text file filter with different encodings)

Filename Patterns¶

Input pattern uses DOS-style wildcards: - *.txt — all files with "txt" extension - read* — all files starting with "read"

Output patterns use variable substitution:

Variable	Description
`${filename}`	Full input filename (default)
`${nameOnly}`	Name without extension
`${extension}`	File extension
`${sourceLanguage}`	Project's source language
`${targetLanguage}`	Project's target language

Example: Java Resource Bundles use ${nameOnly}_${targetLanguage}.${extension} to produce Messages_fr.properties from Messages.properties.

XML Filter Processing¶

OmegaT has sophisticated XML handling:

Tag Classification¶

Paragraph tags — Declare new paragraphs; don't define translatable/untranslatable parts
Intact tags — Mark content that should NOT be translated
Paired tags — Opening/closing tag pairs (e.g., <a>...</a>)
Content-based tags — Tags that must always be preserved regardless of position

Tag Processing Flow¶

Handler.java collects all tags and texts
On paragraph tag, calls translateAndFlush()
Entry.detectTags() determines which parts are translatable
Finds first (textStart) and last (textEnd) text elements, skipping spaces-only
Expands markers to include paired tags inside the text range
Content-based tags are always preserved

Spaces Processing¶

XML Type	Handling
Unformatted	Spaces are read literally; "Remove leading/trailing whitespace" can be disabled
Formatted	Impossible to distinguish formatting spaces from real spaces; "Remove leading/trailing whitespace" must be enabled

TMXReader2/TMXWriter2 use a hybrid approach: segment text is unformatted, other tags are formatted — giving nice-looking XML without space issues.

Filter Types¶

OmegaT includes filters for: - Plain text — .txt, .csv (with configurable encoding) - Web — .html, .htm, .xhtml - XML — .xml, .svg, .xml-based formats - Localization — .po (gettext), .properties (Java resource bundles) - Office — OpenOffice/LibreOffice formats (.odt, .ods, .odp) - Microsoft — .docx, .xlsx (via additional modules) - DTP — .idml (InDesign) - Software — .strings (macOS), .resx, .json - Subtitle — .srt, .vtt

Creating Custom Filters¶

Custom filters can be distributed as plugins: 1. Implement the filter interface 2. Package as a .jar with manifest entry 3. Register via Core.registerFilterClass(MyFilter.class) in loadPlugins() 4. Define input/output filename patterns