LOCALE::PO4A::TRANSTRACTOR.3PM

Section: User Contributed Perl Documentation (1)
Updated: 2025-06-30
Index Return to Main Contents

名稱

Locale::Po4a::TransTractor - 通用翻譯提取器。

描述

Po4a (PO For Anything) 專案的目標是在文件等不需要翻譯的領域使用 gettext 工具簡化翻譯(更有趣的是，簡化翻譯的維護)。

這個類是每個 po4a 解析器的祖先，用於解析文件、搜尋可翻譯字串、將它們提取到 PO 檔案並在輸出文件中用它們的翻譯替換它們。

更正式地說，它接受以下引數作為輸入：

-: 要翻譯的文件;
-: 包含要使用的翻譯的 PO 檔案。

作為輸出，它產生：

-: 另一個 PO 檔案，其結果是從輸入文件中提取可翻譯字串;
-: 翻譯的文件，其結構與輸入中的文件相同，但所有可翻譯字串都替換為輸入中提供的 PO 檔案中的翻譯。

以下是這一點的圖形表示：

   Input document --\                             /---> Output document
                     \                           /       (translated)
                      +-> parse() function -----+
                     /                           \
   Input PO --------/                             \---> Output PO
                                                         (extracted)

解析器應該重寫的函式

parse(): 所有工作都在這裡進行：解析輸入文件、生成輸出和提取可翻譯字串。使用下面 INTERNAL FUNCTIONS 章節中提供的函式可以非常簡單地實現這一點。另請參閱 SYNOPSIS，它提供了一個示例。
此函式由下面的 process() 函式呼叫，但如果您選擇使用 new() 函式，並手動將內容新增到文件中，則必須自己呼叫此函式。
docheader(): 此函式返回我們應該新增到生成的文件中的標題，並將其正確引用為目標語言中的注釋。有關它的好處，請參閱 po4a(7) 中的 Educating developers about translations 章節。

簡介

下面的示例解析以 "<p>" 開頭的段落列表。為簡單起見，我們假定文件格式良好，即 "<p>" 標記是唯一存在的標記，並且該標記位於每個段落的最開始。

 sub parse {
   my $self = shift;

   PARAGRAPH: while (1) {
       my ($paragraph,$pararef)=("","");
       my $first=1;
       my ($line,$lref)=$self->shiftline();
       while (defined($line)) {
           if ($line =~ m/<p>/ && !$first--; ) {
               # Not the first time we see <p>.
               # Reput the current line in input,
               #  and put the built paragraph to output
               $self->unshiftline($line,$lref);

               # Now that the document is formed, translate it:
               #   - Remove the leading tag
               $paragraph =~ s/^<p>//s;

               #   - push to output the leading tag (untranslated) and the
               #     rest of the paragraph (translated)
               $self->pushline(  "<p>"
                               . $self->translate($paragraph,$pararef)
                               );

               next PARAGRAPH;
           } else {
               # Append to the paragraph
               $paragraph .= $line;
               $pararef = $lref unless(length($pararef));
           }

           # Reinit the loop
           ($line,$lref)=$self->shiftline();
       }
       # Did not get a defined line? End of input file.
       return;
   }
 }

一旦實現瞭解析函式，就可以使用下一節中介紹的公共介面來使用 document 類。

使用解析器的指令碼的公共介面

構造函式

process(%)

此函式可以在一次呼叫中完成處理 po4a 文件所需的所有操作。它的引數必須打包為散列。操作：

a.: 讀取 po_in_name 中指定的所有 PO 檔案
b.: 讀取在 file_in_name 中指定的所有原始文件
c.: 解析文件
d.: 讀取並應用所有指定的附錄
e.: 將翻譯的文件寫入 file_out_name (如果給定)
f.: 將提取的 PO 檔案寫入 po_out_name (如果給定)

引數，除了 new() 接受的引數(具有預期型別)：

file_in_name (@): 我們應該在其中讀取輸入文件的檔名列表。
file_in_charset ($): Charset used in the input document (if it isn't specified, use UTF-8).
file_out_name ($): 我們應該在其中寫入輸出文件的檔名。
file_out_charset ($): Charset used in the output document (if it isn't specified, use UTF-8).
po_in_name (@): 我們應該從中讀取輸入 PO 檔案的檔名列表，其中包含將用於翻譯文件的翻譯。
po_out_name ($): 我們應該在其中寫入輸出 PO 檔案的檔名，其中包含從輸入文件提取的字串。
addendum (@): 我們應該從中讀取附錄的檔名列表。
addendum_charset ($): 附錄的字符集。

new(%)

建立新的 po4a 文件。接受的選項 (在作為引數傳遞的雜湊中)：

verbose ($): 設定詳細程度。
debug ($): 設定除錯。
wrapcol ($): The column at which we should wrap text in output document (default: 76).
The negative value means not to wrap lines at all.

: Also it accepts next options for underlying Po-files: porefs, copyright-holder, msgid-bugs-address, package-name, package-version, wrap-po.

操作文件檔案

read($$$)

Add another input document data at the end of the existing array "@{$self->{TT}{doc_in}}".

This function takes two mandatory arguments and an optional one.
* The filename to read on disk;
* The name to use as filename when building the reference in the PO file;
* The charset to use to read that file (UTF-8 by default)

This array "@{$self->{TT}{doc_in}}" holds this input document data as an array of strings with alternating meanings.
* The string $textline holding each line of the input text data.
* The string "$filename:$linenum" holding its location and called as
"reference" ("linenum" starts with 1).

請注意，它不解析任何內容。當您完成將輸入檔案打包到文件中時，應該使用 parse() 函式。

write($)

將翻譯後的文件寫入給定的檔名。

This translated document data are provided by:
* "$self->docheader()" holding the header text for the plugin, and
* "@{$self->{TT}{doc_out}}" holding each line of the main translated text in the array.

操作 PO 檔案

readpo($)

將檔案的內容（該名稱作為引數傳遞）新增到現有輸入 PO。舊內容不會丟棄。

writepo($)

將提取的 PO 檔案寫入給定的檔名。

stats()

返回到目前為止完成的轉換的一些統計資訊。請注意，它與 msgfmt --statistic 列印的統計資料不同。在這裡，它是關於 PO 檔案最近使用情況的統計資訊，而 msgfmt 則報告該檔案的狀態。它是應用於輸入 PO 檔案的 Locale::Po4a::Po::stats_get 函式的封裝。使用示例：

    [normal use of the po4a document...]

    ($percent,$hit,$queries) = $document->stats();
    print "We found translations for $percent\%  ($hit from $queries) of strings.\n";

操作附錄

addendum($): 請參閱 po4a(7)，瞭解有關附錄的詳細資訊，以及翻譯人員應如何編寫附錄。要對翻譯後的文件應用附錄，只需將其檔名傳遞給此函式，即可完成 ;)
此函式在出錯時返回非空整數。

用於編寫派生解析器的內部函式

獲取輸入，提供輸出

提供了四個函式來獲取輸入和返回輸出。它們與 Perl 的 shift/unshift 和 push/pop 非常相似。

 * Perl shift returns the first array item and drop it from the array.
 * Perl unshift prepends an item to the array as the first array item.
 * Perl pop returns the last array item and drop it from the array.
 * Perl push appends an item to the array as the last array item.

第一對是關於輸入的，第二對是關於輸出的。助記符：在 input 中，您感興趣的是第一行，shift 提供什麼，而在 output 中，您希望將結果新增到末尾，就像 push 一樣。

shiftline(): 此函式從陣列 "@{$self->{TT}{doc_in}}" 返回要解析的第一行及其對應的引用 (打包為陣列)，並刪除前兩個陣列項。這裡，引用由字串 "$filename:$linenum" 提供。
unshiftline($$): 取消將輸入文件的最後移位行及其對應的引用移回 "{$self->{TT}{doc_in}}" 的頭部。
pushline($): 將新行推到 "{$self->{TT}{doc_out}}" 的末尾。
popline(): 從 "{$self->{TT}{doc_out}}" 的末尾彈出最後推送的行。

將字串標記為可翻譯

提供一個函式來處理應該翻譯的文字。

translate($$$)

必選引數：

-: 要翻譯的字串
-: 此字串的引用 (即輸入檔案中的位置)
-: 此字串的型別(即對其結構角色的文字描述；在 Locale::Po4a::Po::gettextization() 中使用；另請參閱 po4a(7)，Gettextization: how does it work? 部分)

此函式還可以接受一些額外的引數。它們必須組織為雜湊。例如：

  $self->translate("string","ref","type",
                   'wrap' => 1);

wrap: 指示我們是否可以認為字串中的空格不重要的布林值。如果是，則該函式在查詢或提取翻譯之前對字串進行規範化，並對翻譯進行封裝。
wrapcol: the column at which we should wrap (default: the value of wrapcol specified during creation of the TransTractor or 76).
The negative value will be substracted from the default.
comment: 要新增到條目的額外註釋。

操作：

-: 將字串、引用和型別推送到 po_out。
-: 返回字串的翻譯(如 po_in 中所示)，以便解析器可以構建 doc_out。
-: 在將字串傳送到 po_out 和返回翻譯之前，處理字符集以重新編碼字串。

其他函式

verbose(): 返回是否在建立翻譯提取器期間傳遞了 verbose 選項。
debug(): 返回是否在建立翻譯提取器期間傳遞了除錯選項。
get_in_charset(): This function return the charset that was provided as master charset
get_out_charset(): 此函式將返回應該在輸出文件中使用的字符集(通常用於替換已找到的輸入文件的檢測到的字符集)。
它將使用命令列中指定的輸出字符集。如果未指定該命令，則將使用輸入 PO 的字元集，如果輸入 PO 具有預設的 "CHARSET"，則返回輸入文件的字符集，以便不會執行編碼。

未來方向

當前翻譯提取器的一個缺點是它不能處理包含所有語言的翻譯文件，如 debconf 模板或 .desktop 檔案。

要解決此問題，只需更改介面：

-

將雜湊作為 po_in_name (每種語言的列表)

-

新增要翻譯的引數以指示目標語言

-

使用類似對映的語法建立一個 pushline_all 函式，該函式將為所有語言建立其內容的 pushline：

    $self->pushline_all({ "Description[".$langcode."]=".
                          $self->translate($line,$ref,$langcode)
                        });

看看是否足夠 ;)

作者

 Denis Barbier <barbier@linuxfr.org>
 Martin Quinson (mquinson#debian.org)
 Jordi Vilalta <jvprat@gmail.com>

用於編寫派生解析器的內部函式

獲取輸入，提供輸出
將字串標記為可翻譯
其他函式

未來方向

作者

This document was created by using the manual pages.
Time: 13:03:56 GMT, June 30, 2025

LOCALE::PO4A::TRANSTRACTOR.3PM

名稱

描述

解析器應該重寫的函式

簡介

使用解析器的指令碼的公共介面

構造 函式

操作文件檔案

操作 PO 檔案

操作附錄

用於編寫派生解析器的內部函式

獲取輸入，提供輸出

將字串標記為可翻譯

其他函式

未來方向

作者

Index

構造函式