#1 2009-10-03 05:22

narayan
Senior Member
Registered: 2009-02-08
Posts: 470

Add "Non-English characters" option the Strip Rule

I come across a lot of non-English characters  in file names.
(=Any Unicode symbol that is meant for any non-English language.)

The Strip rule is supposed to eliminate these non-English characters.

To save labor, I have saved a preset with a Strip rule, with all non-English characters I come across. I have set ReNamer to load this preset at start; and whenever I find new non-English characters, I keep adding new characters to it.

But I face a difficulty here: ReNamer shows all non-English characters as a thick vertical line (not its original shape). This part looks like ||||||||||||.

As a result, I am not sure whether the unstripped portion in the file has multiple duplicates of some character. (Just copying the non-English part into the Strip rule would add a lot of duplicate characters, and bloat it).

Thus I have to add only one non-English character at a time.

This makes it very tedious to set up of the Strip rule.

To show just how difficult it is, I will describe what I have to do:

I have created a preset, where I keep adding new non-English characters that I come across.

A typical addition of a new non-English character involves these steps:
1. Start ReNamer and load the target file and the preset.
2. Check the preview to see if ALL the non-English characters are cleaned up.
3. If not, follow this subroutine:
3a. Press F2, and select exactly one non-English character from the target file.
3b. Press CTRL+C to copy this character on the clipboard.
3c. Double-click on the Strip Rule to edit it.
3d. Click in its User specified field, and press CTRL+V to paste the character.
3e. Save the rule.
3f. Check the Preview pane to see if the files are cleaned up now.
      If not, repeat steps 3a-3e.

This has to be done one character at a time to avoid adding duplicate characters in the list.
(Otherwise the list would be bloated).

And probably at the end of this all, I will end up adding entire languages to the Strip rule!

Suggestion:
It would be far simpler to have a "Strip all non-English characters" option.
Or else show the actual shape of each non-English character, so the user can select all unique characters, rather than wading through a long series of |||||||||||| characters.

This would be useful to all users who use pure English names.

Last edited by narayan (2009-10-03 05:24)

Offline

#2 2009-10-03 06:20

narayan
Senior Member
Registered: 2009-02-08
Posts: 470

Re: Add "Non-English characters" option the Strip Rule

Two additional ideas:

1. Extend this facility to non-English users:
To extend this facility to non-English users (Spanish, French, German...) also, consider if language-specific subsets of Unicode  can be declared as "white-list".

2. White-list
Instead of declaring strip this and strip that, let the user simply declare a single white-list.

(A white-list is opposite of a black-list.)

The idea is that any character that is not in this white list will be stripped.

*********
Since we are taking about stripping entire character sets (and not individual characters), probably it would be better to add this feature to the "Cleanup" rule, rather than the "Strip" rule.

Offline

#3 2009-10-04 11:30

SafetyCar
Senior Member
Registered: 2008-04-28
Posts: 446
Website

Re: Add "Non-English characters" option the Strip Rule

I have this for me, maybe you should try it...

begin
    FileName := Copy(FileName, 1, Length(FileName));
    FileName := WideReplaceText(FileName, '?', '');
end.

(note that uses Copy instead of WideCopy to ignore most complex characters)


If this software has helped you, consider getting your pro version. :)

Offline

#4 2009-10-04 14:16

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,374

Re: Add "Non-English characters" option the Strip Rule

narayan, your "||||||||||||"  issue might be a general font problem. The font it-self doesn't have the glyphs for those characters.

On Unicode platforms (Win 2K, XP, Vista) the font is set to "MS Shell Dlg" which is Unicode and Smoothing capable. On older platforms (Win 98, Me) the font will be "MS Sans Serif".

I don't know where to squiz this option, maybe it is better to use PascalScript as SafetyCar has suggested?

SafetyCar, you should use WideToAnsi function which guarantees to break all non local codepage characters.

Offline

#5 2009-10-04 14:22

narayan
Senior Member
Registered: 2009-02-08
Posts: 470

Re: Add "Non-English characters" option the Strip Rule

If this is possible, let's have it! tongue

@locating the option:
Both strip and cleanup rules are good candidates.

  • "strip" because the non-English characters are being stripped.

  • "Cleanup" in the sense of cleaning up the "junk" characters.

Till that time, I can use this pascalscript.
BTW could you list the script with the WideToAnsi function? Thanks!

Last edited by narayan (2009-10-04 14:25)

Offline

#6 2009-10-04 15:19

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,374

Re: Add "Non-English characters" option the Strip Rule

This will do it, you'll just have to replace those "?" marks on place of non-local characters.

FileName := WideToAnsi(FileName);

Offline

#7 2009-10-04 15:44

narayan
Senior Member
Registered: 2009-02-08
Posts: 470

Re: Add "Non-English characters" option the Strip Rule

you'll just have to replace those "?" marks on place of non-local characters

What do you mean by that?

Offline

#8 2009-10-04 19:09

Stefan
Moderator
From: Germany, EU
Registered: 2007-10-23
Posts: 1,161

Re: Add "Non-English characters" option the Strip Rule

narayan wrote:

you'll just have to replace those "?" marks on place of non-local characters

What do you mean by that?

narayan, what den4b means is, if you use
FileName := WideToAnsi(FileName);

there could be some '?' signs left because the font
didn't support (have) an replacement (glyph) for that scan code.

Then just use SafetyCar script

    FileName := Copy(FileName, 1, Length(FileName));
    FileName := WideReplaceText(FileName, '?', '');




BTW, I would prefer
    FileName := WideReplaceText(FileName, '?', '_');


You may want to search google for wiki glyph scan code font.

BTW, this was Denis Post number 1,111 !!!  Thank you Denis for all your time and support.

Last edited by Stefan (2009-10-04 19:12)


Read the  *WIKI* for HELP + MANUAL + Tips&Tricks.
If ReNamer had helped you, please *DONATE* to Denis or buy a PRO license. (Read *Lite vs Pro*)

Offline

#9 2009-10-04 19:29

narayan
Senior Member
Registered: 2009-02-08
Posts: 470

Re: Add "Non-English characters" option the Strip Rule

Denis mentioned earlier that the WideToAnsi function guarantees to break all non local codepage characters. (But apparently copy function does not). He also said I'd have to replace the "?" characters in a second step.

So, would the following combination give guaranteed results?

FileName := WideToAnsi(FileName);
FileName := WideReplaceText(FileName, '?', '');

Secondly, why would you prefer to insert a "_" in place of the "?" characters, rather than just deleting them? Do you have any particular purpose in mind?

Last edited by narayan (2009-10-04 19:32)

Offline

#10 2009-10-04 19:41

Stefan
Moderator
From: Germany, EU
Registered: 2007-10-23
Posts: 1,161

Re: Add "Non-English characters" option the Strip Rule

narayan wrote:

So, would the following combination give guaranteed results?

Would be nice if you would try it and tell us. I have no examples to test with.
And maybe you want to post some example file names
and provide an zip archive with empty test files for others who wanna test it too.

narayan wrote:

Secondly, why would you prefer to insert a "_" in place of the "?" characters, rather than just deleting them?
Do you have any particular purpose in mind?

No, i just guess i would want to know that there was such an not resolved char... to better guess the original meaning.


Read the  *WIKI* for HELP + MANUAL + Tips&Tricks.
If ReNamer had helped you, please *DONATE* to Denis or buy a PRO license. (Read *Lite vs Pro*)

Offline

Board footer

Powered by FluxBB