po4av0.70

PO4A-GETTEXTIZE

Section: Ferramentas Po4a (1p)
Updated: 2024-01-29
Index Return to Main Contents
 

NOME

po4a-gettextize - converte um ficheiro original (e a tradução dele) para um ficheiro PO  

SINOPSE

po4a-gettextize -f fmt -m master.doc -l XX.doc -p XX.po

(XX.po é a saída, todos os outros são entradas)  

DESCRIÇÃO

po4a (PO for anything) facilita a manutenção de tradução da documentação a usar as ferramentas clássicas do gettext. A característica principal do po4a é que ele dissocia a tradução do conteúdo da estrutura documental. Consulte a página po4a(7) para uma introdução suave a este projeto.

The po4a-gettextize script helps you converting your previously existing translations into a po4a-based workflow. This is only to be done once to salvage an existing translation while converting to po4a, not on a regular basis after the conversion of your project. This tedious process is explained in details in Section 'Converting a manual translation to po4a' below.

You must provide both a master file (e.g., the source in English) and an existing translated file (e.g., a previous translation attempt without po4a). If you provide more than one master or translation files, they will be used in sequence, but it may be easier to gettextize each page or chapter separately and then use msgmerge to merge all produced PO files. As you wish.

If the master document has non-ASCII characters, the new generated PO file will be in UTF-8. If the master document is completely in ASCII, the generated PO will use the encoding of the translated input document.  

OPÇÕES

-f, --format
O formato da documentação que pretende processar. Use a opção --help-format para ver a lista de formatos disponíveis.
-m, --master
Ficheiro que contem o documento principal para traduzir. Pode usar esta opção várias vezes se quiser 'gettextize' vários documentos.
-M, --master-charset
Conjunto de carateres do ficheiro que contém o documento a traduzir.
-l, --localized
Ficheiro que contem o documento localizado (traduzido). Se forneceu ficheiros mestres múltiplos, pode fornecer múltiplos ficheiros localizados a usar esta opção mais de uma vez.
-L, --localized-charset
Conjunto de carateres do ficheiro que contém o documento localizado.
-p, --po
Ficheiro onde o catálogo de mensagens deve ser escrito. Se não for dado, a mensagem catálogo será escrito na saída predefinido.
-o, --option
Opção/ções adicional/ais para passar ao plugin de formato. Veja a documentação de cada plugin para mais informações sobre as opções válidas e os significados deles. Por exemplo, poderia passar '-o tablecells' para o analisador AsciiDoc, enquanto o analisador de texto aceitaria '-o tabs=split'.
-h, --help
Mostrar uma pequena mensagem de ajuda.
--help-format
Lista os formatos de documentação compreendidos por po4a.
-k --keep-temps
Mantém o mestre temporário e os ficheiros POT localizados criados antes da mesclagem. Isso pode ser útil para entender por que esses ficheiros são dessincronizados, levando a problemas de gettextização
-V, --version
Mostrar a versão do script e sair.
-v, --verbose
Aumentar o detalhe do programa.
-d, --debug
Saída de alguma informação de depuração.
--msgid-bugs-address e-mail@endereço
Definir o endereço do relatório para msgid bugs. Por predefinição, os ficheiros POT criados não têm campos Report-Msgid-bugs-To.
--copyright-holder string
Definir o titular dos direitos de autor no cabeçalho POT. O valor predefinido é `` Free Software Foundation, Inc.''
--package-name string
Definir o nome do pacote para o cabeçalho POT. A predefinição é ``PACKAGE''.
--package-version string
Definir o nome do pacote para o cabeçalho POT. A predefinição é ``VERSION''.
 

Converter a tradução manual para po4a

po4a-gettextize synchronizes the master and localized files to extract their content into a PO file. The content of the master file gives the msgid while the content of the localized file gives the msgstr. This process is somewhat fragile: the Nth string of the translated file is supposed to be the translation of the Nth string in the original.

Gettextization works best if you manage to retrieve the exact version of the original document that was used for translation. Even so, you may need to fiddle with both master and localized files to align their structure if it was changed by the original translator, so working on files' copies is advised.

Internally, each po4a parser reports the syntactical type of each extracted strings. This is how desynchronization are detected during the gettextization. In the example depicted below, it is very unlikely that the 4th string in translation (of type 'chapter') is the translation of the 4th string in original (of type 'paragraph'). It is more likely that a new paragraph was added to the original, or that two original paragraphs were merged together in the translation.

    Original          Tradução

  capítulo            capítulo
    parágrafo           parágrafo
    parágrafo           parágrafo
    parágrafo         capítulo
  capítulo              parágrafo
    parágrafo           parágrafo

po4a-gettextize will verbosely diagnose any structure desynchronization. When this happens, you should manually edit the files to add fake paragraphs or remove some content here and there until the structure of both files actually match. Some tricks are given below to salvage the most of the existing translation while doing so.

If you are lucky enough to have a perfect match in the file structures out of the box, building a correct PO file is a matter of seconds. Otherwise, you will soon understand why this process has such an ugly name :) Even so, gettextization often remains faster than translating everything again. I gettextized the French translation of the whole Perl documentation in one day despite the many synchronization issues. Given the amount of text (2Mb of original text), restarting the translation without first salvaging the old translations would have required several months of work. In addition, this grunt work is the price to pay to get the comfort of po4a. Once converted, the synchronization between master documents and translations will always be fully automatic.

After a successful gettextization, the produced documents should be manually checked for undetected disparities and silent errors, as explained below.

Dicas e truques para o processo de gettextização

The gettextization stops as soon as a desynchronization is detected. When this happens, you need to edit the files as much as needed to re-align the files' structures. po4a-gettextize is rather verbose when things go wrong. It reports the strings that don't match, their positions in the text, and the type of each of them. Moreover, the PO file generated so far is dumped as gettextization.failed.po for further inspection.

Here are some tricks to help you in this tedious process and ensure that you salvage the most of the previous translation:

Remove all extra content of the translations, such as the section giving credits to the translators. They should be added separately to po4a as addendas (see po4a(7)).
When editing the files to align their structures, prefer editing the translation if possible. Indeed, if the changes to the original are too intrusive, the old and new versions will not be matched during the first po4a run after gettextization (see below). Any unmatched translation will be dumped anyway. That being said, you still want to edit the original document if it's too hard to get the gettextization to proceed otherwise, even if it means that one paragraph of the translation is dumped. The important thing is to get a first PO file to start with.
Não hesite em eliminar qualquer conteúdo original que não exista na versão traduzida. Este conteúdo será reintroduzido automaticamente posteriormente, ao sincronizar o ficheiro PO com o documento.
You should probably inform the original author of any structural change in the translation that seems justified. Issues in the original document should reported to the author. Fixing them in your translation only fixes them for a part of the community. Plus, it is impossible to do so when using po4a ;) But you probably want to wait until the end of the conversion to po4a before changing the original files.
Algumas vezes, o conteúdo do parágrafo não corresponde, mas tipos deles não. Corrigir isso é até dependente do formato. No POD e man, frequentemente vem do fato que um deles contém uma linha a começar com espaço em branco, mas a outra não. Naqueles formatos tal parágrafo não pode ser dimensionado e, então, se torna um tipo diferente. Basta remover o espaço e está terminado. Pode ser um erro de escrita no nome da marcação em XML.

Da mesma forma, dois parágrafos podem ser mesclados num POD quando a linha separadora contém alguns espaços ou quando não há linha vazia entre a linha =item e o conteúdo do item.

Sometimes, the desynchronization message seems odd because the translation is attached to the wrong original paragraph. It is the sign of an undetected issue earlier in the process. Search for the actual desynchronization point by inspecting the file gettextization.failed.po that was produced, and fix the problem where it really is.
Other issues may come from duplicated strings in either the original or translation. Duplicated strings are merged in PO files, with two references. This constitutes a difficulty for the gettextization algorithm, that is a simple one to one pairing between the msgids of both the master and the localized files. It is however believed that recent versions of po4a deal properly with duplicated strings, so you should report any remaining issue that you may encounter.
 

Reviewing files produced by po4a-gettextize

Any file produced by po4a-gettextize should be manually reviewed, even when the script terminates successfully. You should skim over the PO file, ensuring that the msgid and msgstr actually match. It is not necessary to ensure that the translation is perfectly correct yet, as all entries are marked as fuzzy translations anyway. You only need to check for obvious matching issues because badly matched translations will be dumped in subsequent steps while you want to salvage them.

Fortunately, this step does not require to master the target languages as you only want to recognize similar elements in each msgid and its corresponding msgstr. As a speaker of French, English, and some German myself, I can do this for all European languages at least, even if I cannot say one word of most of these languages. I sometimes manage to detect matching issues in non-Latin languages by looking at string length, phrase structures (does the amount of interrogation marks match?) and other clues, but I prefer when someone else can review those languages.

If you detect a mismatch, edit the original and translation files as if po4a-gettextize reported an error, and try again. Once you have a decent PO file for your previous translation, backup it until you get po4a working correctly.  

Running po4a for the first time

The easiest way to setup po4a is to write a po4a.conf configuration file, and use the integrated po4a program (po4a-updatepo and po4a-translate are deprecated). Please check the ``CONFIGURATION FILE'' Section in po4a(1) documentation for more details.

When po4a runs for the first time, the current version of the master documents will be used to update the PO files containing the old translations that you salvaged through gettextization. This can take quite a long time, because many of the msgids of from the gettextization do not exactly match the elements of the POT file built from the recent master files. This forces gettext to search for the closest one using a costly string proximity algorithm. For example, the first run over the Perl documentation's French translation (5.5 MB PO file) took about 48 hours (yes, two days) while the subsequent ones only take seconds.  

Moving your translations to production

After this first run, the PO files are ready to be reviewed by translators. All entries were marked as fuzzy in the PO file by po4a-gettextization, forcing their careful review before use. Translators should take each entry to verify that the salvaged translation actually match the current original text, update the translation on need, and remove the fuzzy markers.

Once enough fuzzy markers are removed, po4a will start generating the translation files on disk, and you're ready to move your translation workflow to production. Some projects find it useful to rely on weblate to coordinate between translators and maintainers, but that's beyond po4a' scope.  

VER TAMBÉM

po4a(1), po4a-normalize(1), po4a-translate(1), po4a-updatepo(1), po4a(7).  

AUTORES

 Denis Barbier <barbier@linuxfr.org>
 Nicolas François <nicolas.francois@centraliens.net>
 Martin Quinson (mquinson#debian.org)

 

DIREITOS DE AUTOR E LICENÇA

Copyright 2002-2023 by SPI, inc.

This program is free software; you may redistribute it and/or modify it under the terms of GPL v2.0 or later (see the COPYING file).


 

Index

NOME
SINOPSE
DESCRIÇÃO
OPÇÕES
Converter a tradução manual para po4a
Reviewing files produced by po4a-gettextize
Running po4a for the first time
Moving your translations to production
VER TAMBÉM
AUTORES
DIREITOS DE AUTOR E LICENÇA

This document was created by using the manual pages.
Time: 00:28:46 GMT, January 29, 2024
català Deutsch English Esperanto español français hrvatski Magyar Italiano 日本語 Bokmål Nederlands polski Português Português (Brasil) Русский српски језик український 简体中文 简体中文 (how to set the default document language)