#1 2009-02-24 11:56

ademmm
Member
Registered: 2009-02-24
Posts: 2

Turkish Charset Problem, HTML_Title tag in UTF-8

Dear denis firstly i should thank you for these really amazing tool
i tried butch rename my .html files in to ":HTML_Title:" but these is failed because  of invalid file names here is my title MÜLK here is how i see by preview mode MÃœLK
thx for ur efforts

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>MÜLK</title>

Last edited by ademmm (2009-02-24 11:58)

Offline

#2 2009-02-24 13:32

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,379

Re: Turkish Charset Problem, HTML_Title tag in UTF-8

ReNamer extracts :HTML_Title: tag as plain ANSI text without using the specified charset, which in your case is UTF-8.

To fix this in general, I would have to extract the specified charset and convert it to Unicode. This is a bit of a problem, because there are hundreds of different charsets.

But for UTF-8, you can use PascalScript to quickly fix this. Try the code below, for the starter:

var
  Title: String;
begin
  Title := CalculateMetaTag(FilePath, 'HTML_Title');
  FileName := UTF8Decode(Title) + ' ' + FileName;
end.

Offline

#3 2009-02-27 06:16

ademmm
Member
Registered: 2009-02-24
Posts: 2

Re: Turkish Charset Problem, HTML_Title tag in UTF-8

thx a lot that's worked for me but some html title have ?/| not supported characters i get the some message with yellow aler icon.
its will be magnificent if application self ignore these types of characters
for example
i get the the message?
alert - invalid filenames
i get the message
application ignore ? character

Offline

#4 2009-02-27 07:31

SafetyCar
Senior Member
Registered: 2008-04-28
Posts: 446
Website

Re: Turkish Charset Problem, HTML_Title tag in UTF-8

I don't know how to do that in pascal script but you could just add a strip rule, marking User defined, and there you put the symbols you don't want.


If this software has helped you, consider getting your pro version. :)

Offline

#5 2009-02-27 12:28

prologician
Member
Registered: 2009-01-30
Posts: 84

Re: Turkish Charset Problem, HTML_Title tag in UTF-8

If you know the original character set (which may or may not be something easily determined.... it might be a case of trial-and-error if you really don't know), you could probably use what I have of my Iconv script to handle the character set conversion to Unicode. In which case, you set up the script to pull out the metadata information, then feed that into iconv, and hope that the magic happens. smile

Using Iconv is more complicated, but applies if the original text encoding is not Unicode.... ie, since the topic of this thread is talking Turkish, that the original text encoding could be ISO 8859-3, ISO 8859-9, Windows-1254, etc...

Offline

Board footer

Powered by FluxBB