Difference between revisions of "ReNamer:Rules:Translit"

From den4b Wiki
Jump to navigation Jump to search
(Unicode character forms)
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Translit Rule ==
+
{{Go|up=ReNamer:Rules|prev=ReNamer:Rules:CleanUp|next=ReNamer:Rules:RegEx}}
<center>[[Image:TranslitRule1.png]]</center>
 
  
This rule transliterates Non-English characters from different languages into their English/Latin representation. For example, the German character '''&uuml;''' can be transliterated to '''ue''' (the name '''M&uuml;ller '''can be also written as''' Mueller''').
+
[[Image:TranslitRule.png|center]]
  
So, we create a pair of equivalent characters, like this: '''&uuml;=ue'''
+
This rule transliterates one alphabet into another. Its main goal is to transliterate Non-English characters from different languages into their English/Latin representation. For example, the German character '''ü''' can be transliterated to '''ue''' (the name '''Müller '''can be also written as '''Mueller''').
  
(Note that the right side of this equation has ''two'' characters. Any number of characters may be placed on both sides of the equation.)
+
This rule uses ''transliteration maps'' (explained below).  
  
We need several such ''equivalent character pairs'' to convert one language into another. An entire set is called a ''transliteration map''. (This is really a character-level find-and-replace rule.)
+
== Transliteration maps ==
 +
 
 +
To transliterate, we create a pair of equivalent characters, like this: '''ü=ue'''
 +
 
 +
(Note that the right side of this equation has ''two'' characters. Any number of characters may be placed on both sides of the equation.)
 +
 
 +
We need several such ''equivalent character pairs'' to convert one language into another. The entire set is called a ''transliteration map''. (This is really some kind of a find-and-replace rule.)  
  
 
ReNamer has several such built-in maps. Each map is named after a language (the second language in all maps is English).  
 
ReNamer has several such built-in maps. Each map is named after a language (the second language in all maps is English).  
  
Each map can be used in ''both'' directions (e.g. French-to-English or English-to-French.)
+
Each map can be used in ''both'' directions (e.g. French-to-English or English-to-French.)  
  
When you start up the '''Translit''' Rule, its window does not show any maps. You are free to do any of the following:
+
When you start up the '''Translit''' Rule, its window does not show any maps. You are free to do any of the following:  
  
# Use any of the built-in maps (and use it in ''forward'' or ''reverse'' direction)
+
#Use any of the built-in maps (and use it in ''forward'' or ''reverse'' direction)  
# Create your own map and use it.
+
#Create your own map and use it.  
# Edit a built-in map first, and then use it.
+
#Edit a built-in map first, and then use it.
  
Let us see how to do this.
+
Let us see how to do this.  
  
=== Using a built-in transliteration map ===
+
== Automatic case conversion ==
To select any of the built-in maps, press the [[Image:TranslitMapsButton.png]] button. A list of available transliteration maps pops up:
 
  
<center>[[Image:TranslitMenu.png]]</center>
+
Translit rule does automatic case conversion with an algorithm adopted specifically for transliteration. Translit rule discard the case on the input, i.e. "A=B" is same as "a=b". Case is decided upon case of the input fragment. Multiple character fragments are treated as part of words, with their case decided based on the case of letters around them.
  
Click on the desired transliteration map. As an example, let us click on the French (to English) transliteration map.
+
The logic for the case conversion is as follows (ReNamer Beta from 23 Aug 2009):
 +
<pre>
 +
set OUTPUT-PART to lower case
 +
if first letter in INPUT-PART is upper case then
 +
  if length of OUTPUT-PART bigger than 1 then
 +
    if next letter in original name is upper case then
 +
      convert whole OUTPUT-PART to upper case
 +
    else
 +
      convert only first letter in OUTPUT-PART to upper case
 +
  else
 +
    convert whole OUTPUT-PART to upper case
 +
</pre>
  
The '''Rules''' window changes immediately to show the French characters and their English equivalent.
+
== Using a built-in transliteration map ==
  
<center>[[Image:TranslitRule2.png]]</center>
+
To select any of the built-in maps, press the [[Image:TranslitMapsButton.png]] button. A list of available transliteration maps pops up:
 +
<center>[[Image:TranslitMenu.png]]</center>  
 +
Click on the desired transliteration map. As an example, let us click on the French (to English) transliteration map.
  
 +
The '''Rules''' window changes immediately to show the French characters and their English equivalents.
 +
<center>[[Image:TranslitRuleExample.png]]</center>
 
You can edit any of the entry in this list, add new entries, or delete any of the entries.  
 
You can edit any of the entry in this list, add new entries, or delete any of the entries.  
  
Note that such editing does not alter the saved version of the map. (The map is edited just for a one-time use. So, if you select the same Translit map again, ReNamer will load the ''original'' version, not the ''edited'' version.) We will see how to edit and save a map [http://den4b.com/wiki/index.php/ReNamer:Rules:Translit#Saving_a_transliteration_map later].
+
Note that such editing does not alter the saved version of the map. The map is edited just for a one-time use. If you select the same Translit map again, ReNamer will load the ''original'' version, not the ''edited'' version. You will see how to [[#Saving_a_transliteration_map|alter a transliteration map]] in a section below.  
  
Next, select the rule's parameters as shown below::
+
Next, select the rule's parameters as shown below:  
 
 
 
 
{| class="prettytable"
 
| <center>'''Parameter'''</center>
 
| <center>'''Details'''</center>
 
  
 +
{| class="wikitable"
 +
|-
 +
! Parameter
 +
! Details
 
|-
 
|-
| forward
+
| forward  
 
| This is transliteration from-left-to-right direction, as defined in the map.
 
| This is transliteration from-left-to-right direction, as defined in the map.
 
 
|-
 
|-
| Backward
+
| backward
 
| This is transliteration from-right-to-left direction, as defined in the map.
 
| This is transliteration from-right-to-left direction, as defined in the map.
 
 
|-
 
|-
| Skip extension
+
| skip extension  
| If this check box is unselected, the extension will be included in the rule.
+
| If this check box is selected, the extension will be ignored by the rule.
 +
|}
  
|}
 
 
Finally, press the [[Image:AddRuleButton.png]] button to add the rule to the stack.
 
Finally, press the [[Image:AddRuleButton.png]] button to add the rule to the stack.
  
=== Making your own transliteration map ===
+
== Making your own transliteration map ==
Click in the '''Translit Alphabet '''window, and start entering the equivalent characters (one transliteration per line).
 
  
For example,  
+
Click in the '''Translit Alphabet''' window, and start entering your custom alphabet.
  
'''&uuml;=ue'''
+
Transliteration alphabet consists of two equivalence parts (or a couple), which are entered one per line and two parts separated with "=" (equal sign). Alphabet should not contain spaces and should have case discarded ([[ReNamer:Rules:Translit#Automatic_case_conversion|case is adjusted automatically]]). Also, make sure to put couples which contain greater number of characters at the top, so they will get processed first and will not get processed partially by shorter representations. Below is a simple example:
  
'''&ouml;=oe'''
+
{| align="center"
 
+
|
'''&szlig;=ss'''
+
<pre>
 +
щ=sh
 +
ю=yu
 +
я=ya
 +
ь='
 +
э=e
 +
</pre>
 +
|}
  
 
After entering all such transliterations, press the [[Image:AddRuleButton.png]] button to add the rule to the rule-stack.  
 
After entering all such transliterations, press the [[Image:AddRuleButton.png]] button to add the rule to the rule-stack.  
Line 74: Line 96:
 
Note that this rule is not saved yet (it was just composed for a one-time use). The following topic shows how to save a map.
 
Note that this rule is not saved yet (it was just composed for a one-time use). The following topic shows how to save a map.
  
=== Saving a transliteration map ===
+
== Saving a transliteration map ==
 +
 
 
To save a newly composed Transliteration rule,  
 
To save a newly composed Transliteration rule,  
# Press the [[Image:TranslitMapsButton.png]] button.  
+
 
#: A menu pops up.  
+
#Press the [[Image:TranslitMapsButton.png]] button.  
#:<center> [[Image:TranslitMenu.png]]</center>
+
#:A menu pops up.  
# Select the last option ('''Save Translit...''').  
+
#:<center>[[Image:TranslitMenu.png]]</center>  
 +
#Select the last option ('''Save Translit...''').  
 
#:A window pops up, as shown below:  
 
#:A window pops up, as shown below:  
#: <center>[[Image:SaveTranslitMapDialog.png]]</center>
+
#:<center>[[Image:SaveTranslitMapDialog.png]]</center>  
# Enter a new name for the map and press '''OK'''. The new map is saved.
+
#Enter a new name for the map and press '''OK'''. The new map is saved.
  
The process to save an edited Transliteration map is similar. The only difference is that the '''Save Translit '''window (see above) shows the current map's name. You can press '''OK''' to save the changes you just made, or enter a new name to create a edited version of the current map.
+
The process of saving an edited Transliteration map is similar. The only difference is that the '''Save Translit '''window (see above) shows the current map's name. You can press '''OK''' to save the changes you've just made, or enter a new name to create a new translit map for the edited version of the current map.  
  
The new map's name is added to the map list.
+
The new map's name is added to the map list.  
  
 
From now on, the new map will also be available as "standard".
 
From now on, the new map will also be available as "standard".
 +
 +
== Unicode character forms ==
 +
 +
Have you encounter a case where some characters don't get converted, despite having a visually identical character defined in the Translit alphabet?
 +
 +
Unicode characters can be defined using exact character codes or using [https://en.wikipedia.org/wiki/Combining_character combining characters]. The displayed characters will look identical, but their binary content is completely different. The conversion process between these forms is covered by the [https://unicode.org/reports/tr15/ Unicode Normalization] standard.
 +
 +
Alphabets in the Translit rule are normally defined using exact character codes, so the combining characters won't get affected. You can put a piece of text through a ''Unicode analyzer'' to see exactly how each character is defined and to identify the use of combining characters.
 +
 +
To handle all possible forms of the same visual character in Translit alphabets, one could define all possible forms in an alphabet or one can simply strip away those combining characters, which can be accomplished by using the "Strip unicode marks" option found in the [[ReNamer:Rules:CleanUp|Clean Up rule]].
 +
 +
[[Category:ReNamer]]

Latest revision as of 10:43, 6 January 2023

TranslitRule.png

This rule transliterates one alphabet into another. Its main goal is to transliterate Non-English characters from different languages into their English/Latin representation. For example, the German character ü can be transliterated to ue (the name Müller can be also written as Mueller).

This rule uses transliteration maps (explained below).

Transliteration maps

To transliterate, we create a pair of equivalent characters, like this: ü=ue

(Note that the right side of this equation has two characters. Any number of characters may be placed on both sides of the equation.)

We need several such equivalent character pairs to convert one language into another. The entire set is called a transliteration map. (This is really some kind of a find-and-replace rule.)

ReNamer has several such built-in maps. Each map is named after a language (the second language in all maps is English).

Each map can be used in both directions (e.g. French-to-English or English-to-French.)

When you start up the Translit Rule, its window does not show any maps. You are free to do any of the following:

  1. Use any of the built-in maps (and use it in forward or reverse direction)
  2. Create your own map and use it.
  3. Edit a built-in map first, and then use it.

Let us see how to do this.

Automatic case conversion

Translit rule does automatic case conversion with an algorithm adopted specifically for transliteration. Translit rule discard the case on the input, i.e. "A=B" is same as "a=b". Case is decided upon case of the input fragment. Multiple character fragments are treated as part of words, with their case decided based on the case of letters around them.

The logic for the case conversion is as follows (ReNamer Beta from 23 Aug 2009):

set OUTPUT-PART to lower case
if first letter in INPUT-PART is upper case then
  if length of OUTPUT-PART bigger than 1 then
    if next letter in original name is upper case then
      convert whole OUTPUT-PART to upper case
    else
      convert only first letter in OUTPUT-PART to upper case
  else
    convert whole OUTPUT-PART to upper case

Using a built-in transliteration map

To select any of the built-in maps, press the TranslitMapsButton.png button. A list of available transliteration maps pops up:

TranslitMenu.png

Click on the desired transliteration map. As an example, let us click on the French (to English) transliteration map.

The Rules window changes immediately to show the French characters and their English equivalents.

TranslitRuleExample.png

You can edit any of the entry in this list, add new entries, or delete any of the entries.

Note that such editing does not alter the saved version of the map. The map is edited just for a one-time use. If you select the same Translit map again, ReNamer will load the original version, not the edited version. You will see how to alter a transliteration map in a section below.

Next, select the rule's parameters as shown below:

Parameter Details
forward This is transliteration from-left-to-right direction, as defined in the map.
backward This is transliteration from-right-to-left direction, as defined in the map.
skip extension If this check box is selected, the extension will be ignored by the rule.

Finally, press the AddRuleButton.png button to add the rule to the stack.

Making your own transliteration map

Click in the Translit Alphabet window, and start entering your custom alphabet.

Transliteration alphabet consists of two equivalence parts (or a couple), which are entered one per line and two parts separated with "=" (equal sign). Alphabet should not contain spaces and should have case discarded (case is adjusted automatically). Also, make sure to put couples which contain greater number of characters at the top, so they will get processed first and will not get processed partially by shorter representations. Below is a simple example:

щ=sh
ю=yu
я=ya
ь='
э=e

After entering all such transliterations, press the AddRuleButton.png button to add the rule to the rule-stack.

Note that this rule is not saved yet (it was just composed for a one-time use). The following topic shows how to save a map.

Saving a transliteration map

To save a newly composed Transliteration rule,

  1. Press the TranslitMapsButton.png button.
    A menu pops up.
    TranslitMenu.png
  2. Select the last option (Save Translit...).
    A window pops up, as shown below:
    SaveTranslitMapDialog.png
  3. Enter a new name for the map and press OK. The new map is saved.

The process of saving an edited Transliteration map is similar. The only difference is that the Save Translit window (see above) shows the current map's name. You can press OK to save the changes you've just made, or enter a new name to create a new translit map for the edited version of the current map.

The new map's name is added to the map list.

From now on, the new map will also be available as "standard".

Unicode character forms

Have you encounter a case where some characters don't get converted, despite having a visually identical character defined in the Translit alphabet?

Unicode characters can be defined using exact character codes or using combining characters. The displayed characters will look identical, but their binary content is completely different. The conversion process between these forms is covered by the Unicode Normalization standard.

Alphabets in the Translit rule are normally defined using exact character codes, so the combining characters won't get affected. You can put a piece of text through a Unicode analyzer to see exactly how each character is defined and to identify the use of combining characters.

To handle all possible forms of the same visual character in Translit alphabets, one could define all possible forms in an alphabet or one can simply strip away those combining characters, which can be accomplished by using the "Strip unicode marks" option found in the Clean Up rule.