#1 2016-03-26 00:00

Elektro
Senior Member
Registered: 2014-05-28
Posts: 76

[BUG] Batch code will not find filenames if...

Hi

( More than a Bug maybe this is a design issue, and my idea of improvement that will avoid culture-specific errors. )

As I seen, the default batch code generated by ReNamer does not take into account the current character set on which the filename was written.

So for example, in a PC with a default windows installation, This will not work:

REM Generated by ReNamer @ 2016-03-25 23:36:57
REN "ÑñçÇ.txt" "Hello World.txt"

(because by default, the codepage 850 is used, so filenames will not be found)

To make it work, either the end-user should have tweaked him default cmd settings to use the LATIN charset (Windows-1252) or the generated code should do usage of the CHCP command to temporally change the character codepage (it is temporally, just takes effect during the CMD instance lifetime):

REM Generated by ReNamer @ 2016-03-25 23:36:57
CHCP 1252

REN "ÑñçÇ.txt" "Hello World.txt"

Also, please take into account that a standard convention in batch usage is to use the @Echo OFF to disable the annonying echoing messages at the start of the script, and also I think that a "Pause" command to stop the code execution is required to let the user see the results and/or errors, so maybe this could be the final code:

@Echo OFF & Title ReNamer Batch-Script
REM Generated by ReNamer @ 2016-03-25 23:36:57

CHCP 1252
REN "ÑñçÇ.txt" "Hello World.txt"

PAUSE & EXIT /B 0

Now, the real problem on this, is how to determine what codepage to use in the batch code generation, I'm sure that most users like to use ReNamer in English even if they are russians or bolivians or..., so the user preferences is not an option to decide the codepage to write in the script, that could generate the opposite problem, so I think that a good idea, which is not the best, but is a proper and simple approach, is to ensure the codepage to use in the batch code by taking all the chars of the filenames column table (both old name and new name) and comparing the chars with the lists of the "Translits" textfiles, if one is found, use the codepage associated to that Translist.

Another approach could be to pick up the Win32 Unicode related functions (Virtual key-codes related, ToUnicodeEx func., Locale IDs and culture especific related or Keyboard Layout Ids related, etc), but maybe that would be too much effort just to achieve a feature like this.

Thanks for read!

Last edited by Elektro (2016-03-26 00:17)

Offline

#2 2016-03-26 19:49

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: [BUG] Batch code will not find filenames if...

The Windows command shell still doesn't support Unicode properly and I bet never will either because MS' recommendation is to use PowerShell instead. So IMO there will always be the potential to run into trouble sometime or other due to unsupported characters.

Offline

#3 2016-03-26 20:30

Elektro
Senior Member
Registered: 2014-05-28
Posts: 76

Re: [BUG] Batch code will not find filenames if...

There is no thing like "unsupported" characters, since a codepage can be set to support each of the specific existing Unicode character sets.

The CMD doesn't support Unicode properly but this meaning ONLY affects for the representation of the characters on the stdout buffer, a issue that is purely visual, then really does not matter on renaming purpose. Every Unicode character is properly read "internally" (when using the proper codepage).

Last edited by Elektro (2016-03-26 21:18)

Offline

#4 2016-03-27 18:47

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: [BUG] Batch code will not find filenames if...

Ok, but I'll be testing any such solution (if there's one) thoroughly because the last thing one would want is hundreds or thousands of mangled file names. What in your opinion should be done when the chars don't match any of the translits?

Offline

#5 2016-03-27 19:12

Elektro
Senior Member
Registered: 2014-05-28
Posts: 76

Re: [BUG] Batch code will not find filenames if...

Andrew wrote:

What in your opinion should be done when the chars don't match any of the translits?

Simple, in that case English lexic is assumed, then any changes in the CHCP command are required.

Maybe will not be a perfect assumption, but just regarding to have a percentage of false-positives assumption, any idea becomes to be perfect, just to be better than the current algorithm.

PS:
In any of the cases, I mean if the character encoding detection will be improved or not, this is an scenario on which den4b should at least generate a REM comment-line advertising the user about special characters and how to proceed with the CHCP command.

Thanks for read

Last edited by Elektro (2016-03-27 19:45)

Offline

#6 2016-03-27 21:49

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,482

Re: [BUG] Batch code will not find filenames if...

It's possible to set the console code page to UTF8 using the following command:

CHCP 65001

Then, file names can be encoded in UTF8 inside of the batch file. Initial tests didn't highlight issues, so this could get integrated into the upcoming v6.4.0.1 Beta.

For the reference, file names were previously encoded using the current system ANSI code page.

Setting "@ECHO OFF" may not be a good idea. For example, you won't know which files have failed without knowing which command was executed. Scripts which use "@ECHO OFF" should normally have some kind of error handling built into the script.

Thanks for a healthy discussion with a pinch of argument wink

Last edited by den4b (2016-03-27 21:55)

Offline

#7 2016-03-28 08:44

Elektro
Senior Member
Registered: 2014-05-28
Posts: 76

Re: [BUG] Batch code will not find filenames if...

den4b wrote:

Setting "@ECHO OFF" may not be a good idea. For example, you won't know which files have failed without knowing which command was executed. Scripts which use "@ECHO OFF" should normally have some kind of error handling built into the script.

That is true but only if no error-handling is added in the code, as you have said.

@ECHO ON only generates a trash/spaguetti output, it is not useful to debug anything.

If the reason to enable it is for keep some kind of errors tracking, then why don't just add those missing error-handlings in the generated code?.

I think this solution below keeps it simple avoiding code repetition by implementing a tiny procedure to perform the desired error-handlings with a clean output avoiding echoing:

@Echo OFF & Title ReNamer task...
CHCP 65001

Call :DoRename "C:\not missing file" "New Name"
Call :DoRename "C:\missing file"     "New Name"
REM More files here...
Pause & Exit /B 0

:DoRename :: %1=SourceName, %2=TargetName.
((RENAME "%~1" "%~2")2>Nul && (
    ECHO: Success renaming from "%~1" to "%~2"
) || (
    ECHO: Failed to rename from "%~1" to "%~2"
)) & GOTO :EOF

PS: The code above does not cause a stack overflow problems, tested with a For-Range (For /L) of 100.000 renaming iterations.

Thanks for read.

Last edited by Elektro (2016-03-28 09:24)

Offline

#8 2016-03-28 10:06

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,482

Re: [BUG] Batch code will not find filenames if...

The purpose of the generated renaming batch file is to create a snapshot of renaming operations. Being a self-sufficient program with error handling and possibly other features is outside of its scope. After all, you should use ReNamer if you require more features.

Offline

#9 2016-03-29 01:54

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: [BUG] Batch code will not find filenames if...

den4b wrote:

It's possible to set the console code page to UTF8 using the following command:

CHCP 65001

Yes, but that's only half the story. If the filenames need to be viewed properly then the console font needs to be changed too. Also, be sure to read the comments here. I just don't think chcp 65001 is the magic bullet that'll solve all possible problems.

Offline

#10 2016-03-29 10:26

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,482

Re: [BUG] Batch code will not find filenames if...

Andrew wrote:

Yes, but that's only half the story. If the filenames need to be viewed properly then the console font needs to be changed too. Also, be sure to read the comments here. I just don't think chcp 65001 is the magic bullet that'll solve all possible problems.

A bat/cmd file with CHCP 65001 and REN command with UTF8 encoded file names is working fine, in all tests so far. The interpretation or UTF8 characters and display of Unicode characters are two independent issues. The interpretation part seems to be working. The display relies on the a Unicode font to be used for consoles, but it's not required for a correct interpretation of console commands.

FYI, many answers and comments in stack overflow are quite misleading or plain wrong sometimes.

Andrew wrote:

I just don't think chcp 65001 is the magic bullet that'll solve all possible problems.

So far there are no issues with using CHCP 65001.

Offline

Board footer

Powered by FluxBB