You are not logged in.
OK I will post example characters I come across next.
BTW all characters that look like the thick rectangle are the unresolved characters. So I think the replacement with a "_" won't serve any extra identification.
Last edited by narayan (2009-10-04 19:47)
Offline
I've been thinking a lil bit about your suggestion, narayan, of a multilingual character masking thingy.
As it happens, some of what's been mentioned here matches the behavior of libiconv when some special settings are enabled (I wrote a PascalScript wrapper for iconv some time ago...). Namely....
When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character.
When the string "//IGNORE" is appended to tocode, characters that cannot be represented in the target character set will be silently discarded.
Feasibly, all one would need to do is convert the filename to some specific character encoding with //IGNORE used, then (if the character set isn't ASCII or something), convert the text back to UTF8.
Offline
Looks promising. Can you put out a script, then? It should be even more robust than the two versions we just discussed (otherwise it is of academic interest only, as a brain-teaser).
Offline
Hehe. Here's the thread I made some time ago on getting Iconv to interact with PascalScript (it comes in handy for me, since I often have to deal with Japanese filenames). It's not a complete wrapper for Iconv (there are some special functions that the DLL supports... one such setting falls under the realm of the problem you're describing, too), but for most purposes it works pretty nicely.
I should work on finishing that wrapper, really. Even if these additional features are nonstandard extensions (which it does appear to be, according to one of the files included with the DLL....)
Offline
Copy is not meant for this purpose, and the main problem is that Copy can have an overloaded version which can process both AnsiString and WideString parameters. It works for now, but the the use of this function for such purposes does not guarantee the same result in the future.
WideToAnsi is a dedicated function specifically for this purpose. This explicitly converts WideString to AnsiString, rendering all non-local characters as "?" signs.
I hope this explains it all.
BY THE WAY, I've added few new options to Strip rule, check them out: ReNamerBeta.zip
Offline
OK now that we have a robust solution, can we have that as an option in the Cleanup (or strip) rule?
Then people will not have to go through the extra steps of copying this script, saving it and then calling it when needed.
Offline
Oops- Missed that last post from you, Denis!
Putting it in three separate parts has added much more flexibility. That's clever, indeed.
Now it has many additional uses.
Thanks!
Last edited by narayan (2009-10-14 04:09)
Offline