Filter System¶
Summary¶
OmegaT's filter system enables reading, extracting translatable content from, and writing back a wide variety of file formats — from .po and .properties to .html, .xml, .xlsx, .docx, OpenOffice, and many more. Each filter is a two-fold component that can both read and write its format.
How Filters Work¶
A filter class can: 1. Read — Parse a document in a given format 2. Extract — Pull out translatable content as segments 3. Write — Rebuild the document, replacing translatable content with translations
Key invariant: Filters must be two-fold (read & write the same format).
FilterMaster¶
FilterMaster is the central organizer:
- Maintains the registry of all available filters
- Detects which filter to use for a given file
- Routes files to appropriate filters based on extension and content
File Detection¶
OmegaT distinguishes files by:
- File extension — e.g., *.txt, *.po, *.html
- File content — Some formats require content inspection
- Filter instantiation — A single filter class can be instantiated multiple times with different parameters (e.g., text file filter with different encodings)
Filename Patterns¶
Input pattern uses DOS-style wildcards:
- *.txt — all files with "txt" extension
- read* — all files starting with "read"
Output patterns use variable substitution:
| Variable | Description |
|---|---|
${filename} |
Full input filename (default) |
${nameOnly} |
Name without extension |
${extension} |
File extension |
${sourceLanguage} |
Project's source language |
${targetLanguage} |
Project's target language |
Example: Java Resource Bundles use ${nameOnly}_${targetLanguage}.${extension} to produce Messages_fr.properties from Messages.properties.
XML Filter Processing¶
OmegaT has sophisticated XML handling:
Tag Classification¶
- Paragraph tags — Declare new paragraphs; don't define translatable/untranslatable parts
- Intact tags — Mark content that should NOT be translated
- Paired tags — Opening/closing tag pairs (e.g.,
<a>...</a>) - Content-based tags — Tags that must always be preserved regardless of position
Tag Processing Flow¶
Handler.javacollects all tags and texts- On paragraph tag, calls
translateAndFlush() Entry.detectTags()determines which parts are translatable- Finds first (
textStart) and last (textEnd) text elements, skipping spaces-only - Expands markers to include paired tags inside the text range
- Content-based tags are always preserved
Spaces Processing¶
| XML Type | Handling |
|---|---|
| Unformatted | Spaces are read literally; "Remove leading/trailing whitespace" can be disabled |
| Formatted | Impossible to distinguish formatting spaces from real spaces; "Remove leading/trailing whitespace" must be enabled |
TMXReader2/TMXWriter2 use a hybrid approach: segment text is unformatted, other tags are formatted — giving nice-looking XML without space issues.
Filter Types¶
OmegaT includes filters for:
- Plain text — .txt, .csv (with configurable encoding)
- Web — .html, .htm, .xhtml
- XML — .xml, .svg, .xml-based formats
- Localization — .po (gettext), .properties (Java resource bundles)
- Office — OpenOffice/LibreOffice formats (.odt, .ods, .odp)
- Microsoft — .docx, .xlsx (via additional modules)
- DTP — .idml (InDesign)
- Software — .strings (macOS), .resx, .json
- Subtitle — .srt, .vtt
Creating Custom Filters¶
Custom filters can be distributed as plugins:
1. Implement the filter interface
2. Package as a .jar with manifest entry
3. Register via Core.registerFilterClass(MyFilter.class) in loadPlugins()
4. Define input/output filename patterns
See Also¶
- Omegat Architecture
- Plugin System
localization