#1 2009-01-31 10:10

prologician
Member
Registered: 2009-01-30
Posts: 84

'Pointer' data type for ICONV.DLL?

I'm trying my hand at writing PascalScript (which, at the moment, is closer to fumbling around in the dark due to having no familiarity with Pascal or Delphi), so please forgive me in advance if this is an obvious question.

I'm trying to incorporate an external DLL into a ReNamer script, and I have the DLL function prototypes. However, the prototypes describe using a void*, which seems like it should be closest to the "pointer" Delphi data type. However, whenever I attempt to use the script

var
memStart : Pointer;

begin

end.

I receive the error "Unknown type "Pointer"". What type should be used instead of "Pointer" to serve the purpose of a typeless data pointer? (I've also tried using ^Integer and receive a similar lack of success.... so I guess this applies to typed pointers as well.)

Offline

#2 2009-02-01 13:43

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,375

Re: 'Pointer' data type for ICONV.DLL?

Unfortunately, PascalScript does not support Pointers, at least not yet.

But don't give up, there are ways around it. Just so you know, Pointers are exactly same as Integers, speaking memory wise, but they are used for different purposes. In other words, in PascalScript, you should be able to use Integers instead of Pointers to call C/C++ functions with VOID* parameters.

Why don't you give us your function declaration and we will try to help you....

By the way, here is a similar post with some discussions, it might can help: Declaring a pointer before record

Offline

#3 2009-02-01 21:56

prologician
Member
Registered: 2009-01-30
Posts: 84

Re: 'Pointer' data type for ICONV.DLL?

What I'm trying to do at this point, is to follow-up on my own suggestion post regarding character encodings. After a bit of legwork, I found that libiconv could potentially do the job (and there are even native Windows builds), all that would be needed then is the interface to link ReNamer to iconv.dll. (What can I say, I was inspired by your efforts with TriD... smile )

(For the purposes of version consistency, I'm using this Windows build, v1.9.2.)

According to the included iconv.h in the ZIP file, the prototypes that should be used are these (I'm copying only the important lines to help in understanding):

typedef void* iconv_t;

#define iconv_open libiconv_open
extern iconv_t iconv_open (const char* tocode, const char* fromcode);

#define iconv libiconv
extern size_t iconv (iconv_t cd, const char* * inbuf, size_t *inbytesleft, char* * outbuf, size_t *outbytesleft);

#define iconv_close libiconv_close
extern int iconv_close (iconv_t cd);

There are a few other functions exported from the DLL, though they're described as "nonstandard", and I don't know if I even need them.

At this time, I have a nonfunctional PascalScript, with my attempt to call these functions (actually, just libiconv_open, cuz I'm having trouble with that even).

const
  FROM_ENCODING = 'CP932';
  TO_ENCODING = 'UTF-8';

function IconvOpen(toCode, fromCode: pchar): LongInt;
  external 'libiconv_open@iconv.dll';

function IconvClose(cp: LongInt): Integer;
  external 'libiconv_close@iconv.dll';

var
  Initialized: Boolean;
  IconvInstance: LongInt;
  from: String;
  target: String;
begin
  SetLength(from, 100);
  SetLength(target, 100);


  from := FROM_ENCODING;
  target := TO_ENCODING;
  if not Initialized then
  begin
    iconvinstance := IconvOpen(pchar(from), pchar(target));
    initialized := true
  end;
  ShowMessage(IntToStr(iconvinstance));
  IconvClose(iconvinstance)
end.

The ShowMessage() displays a popup message with the text '-1'... which (I think) means that libiconv_open couldn't create the instance (according to the documentation, this happens when either there's no memory, or when the character encodings aren't supported.) However, I know that both CP932 and UTF-8 ~are~ supported, so my suspicion now is that something is wrong with the string being passed. Though I don't know what. hmm

Offline

#4 2009-02-02 07:52

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,375

Re: 'Pointer' data type for ICONV.DLL?

The most important thing that you missed is the calling convention, which is "cdecl" in this case.

You might have some trouble with iconv function, because of PChar type for input and output parameters. Not sure yet how to properly put FileName:WideString into inbuf:PChar and outbuf:PChar back into FileName:WideString without messing up the content (in PascalScript). Each character in WideString occupies 2 bytes, which would be equivalent to 2 characters in String or PChar type. But when you actually assign WideString to String, or other way around, the conversion happens, i.e. 2 byte characters are mapped into 1 byte characters.

Anyway, experiment, and tell us how far you can go until you require more help.

function iconv_open(const tocode, fromcode: PChar): Integer;
  external 'libiconv_open@iconv.dll cdecl';

function iconv(cd: Integer; var inbuf: PChar; var inbytesleft: Cardinal;
  var outbuf: PChar; var outbytesleft: Cardinal): Cardinal;
  external 'libiconv@iconv.dll cdecl';

function iconv_close(cd: Integer): Integer;
  external 'libiconv_close@iconv.dll cdecl';

var
  Initialized: Boolean;
  cd: Integer;

begin
  if not Initialized then
  begin
    cd := iconv_open('KOI8-R', 'UTF-8');
    Initialized := True;
  end;
  ShowMessage(IntToStr(cd));
end.

P.S. I also found iconv headers for Delphi/Pascal in libxml2-pas project, might be useful.

Offline

#5 2009-02-03 01:17

prologician
Member
Registered: 2009-01-30
Posts: 84

Re: 'Pointer' data type for ICONV.DLL?

At this point, I'm figuring that the problem you're describing, could possibly either be addressed by WideToAnsi(), or in a worse case, UTF8encode(). I'll admit, I haven't thought that massively about it. Personally, I'm leaning toward the WideToAnsi() answer myself... sure, it'll chew up filenames that use non-Ansi characters. Then again, so would a character set conversion. tongue

I took a look at the libxml project, and borrowed a few of their ideas in the naming of types. (I'm not using Integer and Cardinal... reasoning is that, according to Delphi Basics, their size is not necessarily fixed at 4 bytes. Thus why I'm using LongInt and LongWord.)

After a bit of futzing (and realizing that I had FROM_ENCODING and TO_ENCODING swapped when I called IconvOpen()), I finally got something in a workable state. *^_^*

{Libiconv Character Encoding Conversion}

const
  FROM_ENCODING = 'CP932';
  TO_ENCODING = 'UTF-8';
  OUT_BUFFER_SIZE = 255;

type
  iconv_t = LongInt;
  size_t = LongWord;

function IconvOpen(const toCode, fromCode: pchar): iconv_t;
  external 'libiconv_open@iconv.dll cdecl';

function Iconv(cd: iconv_t; var inbuf: pchar; var inbytesleft: size_t; var outbuf: pchar; var outbytesleft: size_t): size_t;
  external 'libiconv@iconv.dll cdecl';

function IconvClose(cd: iconv_t): Integer;
  external 'libiconv_close@iconv.dll cdecl';

var
  cd: iconv_t;
  from_string: string;
  from_pchar: pchar;
  from_length: size_t;
  out_buffer: string;
  out_buffer_pchar: pchar;
  out_buffer_length: size_t;
  results: size_t;
  error: Boolean;
begin
  cd := IconvOpen(TO_ENCODING, FROM_ENCODING);

  if (cd <> -1) then
  begin
    from_string := WideToAnsi(WideExtractBaseName(FileName));
    from_pchar := pchar(from_string);
    from_length := length(from_string);
    
    //showmessage(inttostr(from_length) + ': ' + from_pchar);
    
    SetLength(out_buffer, OUT_BUFFER_SIZE);
    out_buffer_length := OUT_BUFFER_SIZE;
    out_buffer_pchar := pchar(out_buffer);
    results := iconv(cd, from_pchar, from_length, out_buffer_pchar, out_buffer_length);
    
    //showMessage(inttostr(results) + ': ' + inttostr(from_length)+ ',' + inttostr(out_buffer_length));
    SetLength(out_buffer, OUT_BUFFER_SIZE - out_buffer_length);
    
    //wideshowmessage(UTF8decode(out_buffer));
    FileName := UTF8decode(out_buffer) + WideExtractFileExt(FileName);
    
    IconvClose(cd);
  end
  else
  begin
    if not error then
      ShowMessage('Unable to create Iconv instance. Aborting.');
    error := true;
  end;
end.

It's not polished to perfection just yet... namely, it needs to handle other error instances (ie, a partial encoding conversion), but at least all the hard stuff is taken care of. *^_^*

A quick question about instances (in this case, the conversion descriptor 'cd'). Just how persistent are the variables in a PascalScript? Do they only live for the duration of a Preview/Rename operation, or do they exist for the entire life of the program? I would prefer to avoid creating a memory leak here, so that's why I dispose of 'cd' after each use. Alternately, recycling these wouldn't be a bad option. (Of course, throwing it out is a lazier way of doing it, since setting up all the parameters for iconv() in order to reset 'cd' to its initial state is a massive pain...)

Last edited by prologician (2009-02-03 03:19)

Offline

#6 2009-02-04 06:20

prologician
Member
Registered: 2009-01-30
Posts: 84

Re: 'Pointer' data type for ICONV.DLL?

den4b wrote:

You might have some trouble with iconv function, because of PChar type for input and output parameters. Not sure yet how to properly put FileName:WideString into inbuf:PChar and outbuf:PChar back into FileName:WideString without messing up the content (in PascalScript). Each character in WideString occupies 2 bytes, which would be equivalent to 2 characters in String or PChar type. But when you actually assign WideString to String, or other way around, the conversion happens, i.e. 2 byte characters are mapped into 1 byte characters.

From what I've rummaged through, it's not as bad an issue as you might think. You can read out WideChars individually from the WideString, then apply Ord() to them to dig out a numerical value. Then just parse that number how you choose in a bytewise manner (div and mod by 256, and append the Chr() to a new String). That's the string you then play with. It's not too hard to reverse this process to get back to WideChars.

A second alternative is what I used in the script already.... get the character encoding to UTF-8, then using UTF8decode() to handle the rest. Sure, it only applies in one direction, but at least the conversion is already written for me. *^_^*

Last edited by prologician (2009-02-04 07:13)

Offline

#7 2009-02-04 12:40

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,375

Re: 'Pointer' data type for ICONV.DLL?

If your script works fine now - it's great, but here is an idea...

1) PASCAL: FileName (WideString) -> UTF-8 (String)
2) ICONV: UTF-8 (String) -> FROM_ENCODING (String)
3) ICONV: FROM_ENCODING (String) -> TO_ENCODING (String)
4) ICONV: TO_ENCODING (String) -> UTF-8 (String)
5) PASCAL: UTF-8 (String) -> FileName (WideString)

The idea is to get your filename from Unicode into FROM_ENCODING first, and then convert it to TO_ENCODING. Not sure if it will work as expected, but might be worth trying.

Offline

#8 2009-02-05 05:46

prologician
Member
Registered: 2009-01-30
Posts: 84

Re: 'Pointer' data type for ICONV.DLL?

den4b wrote:

If your script works fine now - it's great, but here is an idea...

1) PASCAL: FileName (WideString) -> UTF-8 (String)
2) ICONV: UTF-8 (String) -> FROM_ENCODING (String)
3) ICONV: FROM_ENCODING (String) -> TO_ENCODING (String)
4) ICONV: TO_ENCODING (String) -> UTF-8 (String)
5) PASCAL: UTF-8 (String) -> FileName (WideString)

The idea is to get your filename from Unicode into FROM_ENCODING first, and then convert it to TO_ENCODING. Not sure if it will work as expected, but might be worth trying.

A nice thought to be sure, though character set encoding conversions aren't exactly lossless operations. If they were, then you're right, you probably could get away with what you're describing.

I rewrote the previous version of the script into a form, in which you can apply multiple conversion processes in the same script, to try it out anyways. Here's its form:

{Libiconv Character Encoding Conversion}

const
  FROM_ENCODING = 'CP932';
  TO_ENCODING = 'UTF-8';
  OUT_BUFFER_SIZE = 255;

type
  iconv_t = LongInt;
  size_t = LongWord;
  
function IconvOpen(const toCode, fromCode: pchar): iconv_t;
  external 'libiconv_open@iconv.dll cdecl';
function Iconv(cd: iconv_t; var inbuf: pchar; var inbytesleft: size_t;
  var outbuf: pchar; var outbytesleft: size_t): size_t;
  external 'libiconv@iconv.dll cdecl';
function IconvClose(cd: iconv_t): Integer;
  external 'libiconv_close@iconv.dll cdecl';

function IconvConvert(const fromCode, toCode: pchar; const inString: String; var errorstate: Boolean): String;
var
  cd: iconv_t;
  from_string: string;
  from_pchar: pchar;
  from_length: size_t;
  out_buffer: string;
  out_buffer_pchar: pchar;
  out_buffer_length: size_t;
  iconv_results: size_t;
begin
  Result := '';

  if (errorstate) then
    exit;

  cd := IconvOpen(toCode, fromCode);

  if (cd = -1) then
  begin
    ShowMessage('Unable to create iconv instance from '#39 + fromCode + #39' to '#39 + toCode +
      #39'.'#10'Google '#39'man iconv_open'#39' for a list of supported encodings.');
    errorstate := true;
    exit;
  end;
  
  from_string := inString;
  from_pchar := pchar(from_string);
  from_length := length(from_string);

  SetLength(out_buffer, OUT_BUFFER_SIZE);
  out_buffer_length := OUT_BUFFER_SIZE;
  out_buffer_pchar := pchar(out_buffer);
  iconv_results := iconv(cd, from_pchar, from_length, out_buffer_pchar, out_buffer_length);

  if (iconv_results = 0) then
  begin
    SetLength(out_buffer, OUT_BUFFER_SIZE - out_buffer_length);
    Result := out_buffer;
  end
  else
    showmessage('Error converting from '#39 + fromCode + #39' to '#39 + toCode + #39'.');

  IconvClose(cd);
end;

var
  error: Boolean;
  stepone: String;
  steptwo: String;
begin
  stepone := UTF8encode(WideExtractBaseName(FileName));
  steptwo := IconvConvert('UTF-8', FROM_ENCODING, stepone, error);
end.

I've attempted to execute this script on some CP932-encoded text, and I hit the error message "Error converting from 'UTF-8' to 'CP932'." This message indicates that iconv() had returned a nonzero value, indicating an error. And examining the kinds of errors that it can give (via the iconv man pages), there's really only one which makes sense: that it tripped up on some character not in the input codeset, and stopped converting.

Not fully unexpected, as far as I was figuring. But of course the only way to be sure was to test it out. wink

Offline

Board footer

Powered by FluxBB