#1 2022-01-07 19:36

9227
Member
Registered: 2022-01-07
Posts: 2

Strip unicode marks and Ł letter

In rule Clean Up option Strip unicode marks skips Ł ł letter. It should be converted to L l.

Offline

#2 2022-01-08 08:00

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,370

Re: Strip unicode marks and Ł letter

The "Strip unicode marks" option performs a character decomposition routine according to the Unicode standard.

For example, characters "Ã" (U+00C3) decomposes into "A" (U+0041) and "◌̃" (U+0303).

In contrast, characters "Ł" (U+0141) and "ł" (U+0142) do not decompose.

There is an alternative mechanism for mapping "Ł" into "L" and that is the Translit rule, which comes with a large collection of built-in transliteration alphabet.

For example, the built-in Polish transliteration alphabet contains the following mappings:

ą=a Ą=A ć=c Ć=C ę=e Ę=E ł=l Ł=L ń=n Ń=N ó=o Ó=O ś=s Ś=S ź=z Ź=Z ż=z Ż=Z

Offline

#3 2022-01-08 12:54

9227
Member
Registered: 2022-01-07
Posts: 2

Re: Strip unicode marks and Ł letter

I know Translit rule. I just thought Strip unicode marks works in the same way but for all available languages.
Thank you for explanation!

Offline

Board footer

Powered by FluxBB