Difference between revisions of "ReNamer:Pascal Script:Unicode String Handling Routines"

From den4b Wiki
Jump to: navigation, search
m (remove some msword style quotes, formatted code)
Line 3: Line 3:
 
==Unicode String Handling Routines or How to operate on words==
 
==Unicode String Handling Routines or How to operate on words==
  
And what if we have mp3 files of certain format, eg. "''author – title.mp3''" and we want to rename them into "''title - author.mp3''"? We need to split filename in some certain place (on "'' - ''") and then use created parts to build a new filename. We can achieve that with WideSplitString function that takes a string to split (Input) and a Delimiter and returns an array of strings (TStringsArray). If the Input is "''Queen - Bohemian Rhapsody''" and a Delimiter is "'' - ''" it will produce an array <nowiki>[</nowiki>"''Queen''", "''Bohemian Rhapsody''"<nowiki>]</nowiki>.
+
====Swapping parts of the FileName====
 +
What if we have mp3 files of certain format, eg. "''author – title.mp3''" and we want to rename them into "''title - author.mp3''"? We need to split filename in some certain place (on "'' - ''") and then use created parts to build a new filename. We can achieve that with '''WideSplitString''' function that takes Input (a string to split) and Delimiter paramethers and returns an array of strings (TStringsArray type). If the Input is "''Queen - Bohemian Rhapsody''" and a Delimiter is "'' - ''" it will produce an array <nowiki>[</nowiki>"''Queen''", "''Bohemian Rhapsody''"<nowiki>]</nowiki>.
  
Please pay attention that TStringsArray type arrays are zero-based, which means the index of the first element is 0. So we will get array<nowiki>[</nowiki>0<nowiki>]</nowiki> = "''Queen''" and array<nowiki>[</nowiki>1<nowiki>]</nowiki> = "''Bohemian Rhapsody''". The whole operation can be achieved with such a piece of code.
+
Please pay attention that TStringsArray type arrays are zero-based, which means the index of the first element is 0. So we will get array<nowiki>[</nowiki>0<nowiki>]</nowiki> = "''Queen''" and array<nowiki>[</nowiki>1<nowiki>]</nowiki> = "''Bohemian Rhapsody''".  
  
<u>To understand the code below you'll need basic knowledge about variables declaration, arrays and if-then-else statement.</u>
+
The whole operation can be achieved with such a piece of code.  
  
<pre><nowiki>
+
[[ReNamer:Pascal Script:Quick guide|<u>To understand the code below you'll need basic knowledge about variables declaration, arrays and if-then-else statement.</u>]]
 +
 
 +
<source>
 
var
 
var
 
   SplittedFileName: TStringsArray;
 
   SplittedFileName: TStringsArray;
Line 17: Line 20:
 
     FileName := SplittedFileName[1] + ' - ' + SplittedFileName[0] + WideExtractFileExt(FileName);
 
     FileName := SplittedFileName[1] + ' - ' + SplittedFileName[0] + WideExtractFileExt(FileName);
 
end.
 
end.
</nowiki></pre>
+
</source>
 
 
  
 
The script will produce "''Bohemian Rhapsody – Queen.mp3''" from "''Queen – Bohemian Rhapsody.mp3''".
 
The script will produce "''Bohemian Rhapsody – Queen.mp3''" from "''Queen – Bohemian Rhapsody.mp3''".
  
We are checking the length of the array SplittedFileName to ensure that we won't go out of the array bounds (if we would have a file of a different format in the files table, eg. "''Bohemian Rhapsody (Queen)''"), which would give us an error.
+
We are checking the length of the array SplittedFileName to ensure that we won't go out of the array bounds. This would happen if we would have a file of a different format in the files table, eg. "''Bohemian Rhapsody (Queen)''").
  
 +
====Splitting the FileName into words====
 
If we would like to split the FileName into words (word in this case is anything that lays between two spaces) the proper line of code would look like this:
 
If we would like to split the FileName into words (word in this case is anything that lays between two spaces) the proper line of code would look like this:
  
<pre><nowiki>
+
<source>
 
SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' ');
 
SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' ');
</nowiki></pre>
+
</source>
  
Another useful function is WideReplaceStr function. With its help we can eg. replace all appearances of ''<nowiki></nowiki>your car<nowiki></nowiki>'' phrase with ''<nowiki></nowiki>my car<nowiki></nowiki>''.
+
====Replacing parts of the FileName====
 +
Another useful function is '''WideReplaceStr''' function. With its help we can eg. replace all appearances of <nowiki>'</nowiki>''your car''<nowiki>'</nowiki> phrase with <nowiki>'</nowiki>''my car''<nowiki>'</nowiki>.
  
<pre><nowiki>
+
<source>
 
FileName := WideReplaceStr(FileName, 'your car', 'my car');
 
FileName := WideReplaceStr(FileName, 'your car', 'my car');
</nowiki></pre>
+
</source>
  
It will also change ''<nowiki></nowiki>not your car<nowiki></nowiki>'' into ''<nowiki></nowiki>not my car<nowiki></nowiki> ''and if we are really possesive and egoistic we might not like that...
+
It will also change <nowiki>'</nowiki>''not your car''<nowiki>'</nowiki> into <nowiki>'</nowiki>''not my car''<nowiki>'</nowiki> and if we are really possesive and egoistic we might not like that...
  
To solve this problem we will need few others string handling functions and procedures: WidePos, WideInsert and WideDelete. If you<nowiki>’</nowiki>re sure you won<nowiki>’</nowiki>t process any unicode characters, you may use Pos, Insert and Delete functions/procedures instead.
+
====WidePos, WideInsert and WideDelete functions====
 +
To solve the problem we will need few others string handling functions and procedures: '''WidePos''', '''WideInsert''' and '''WideDelete'''. If you<nowiki>’</nowiki>re sure you won<nowiki>’</nowiki>t process any unicode characters, you may use '''Pos''', '''Insert''' and '''Delete''' functions/procedures instead.
  
Before we start to describe them we need to tell you that '''strings in Pascal are represented as 1-based arrays of chars''' which means that the first index of string is 1 (so FileName<nowiki>[</nowiki>0<nowiki>]</nowiki> gives ''<nowiki></nowiki>out of bounds error<nowiki></nowiki>'').
+
Before we start to describe them you need to know that '''strings in Pascal are represented as 1-based arrays of chars''' which means that the first index of string is 1 (so FileName<nowiki>[</nowiki>0<nowiki>]</nowiki> gives <nowiki>'</nowiki>''out of bounds error''<nowiki>'</nowiki>).
  
 
Now we can take a look at the description of functions/procedures that were mentioned above.
 
Now we can take a look at the description of functions/procedures that were mentioned above.
  
<pre><nowiki>
+
<source>
 
function WidePos(const SubStr, S: WideString): Integer;
 
function WidePos(const SubStr, S: WideString): Integer;
</nowiki></pre>
+
</source>
  
WidePos finds a substring in given string S and returns the position of its first char.
+
'''WidePos''' finds a substring in given string S and returns the position of its first char.
  
So WidePos(<nowiki></nowiki>car<nowiki></nowiki>, <nowiki></nowiki>s'''car''' tissue<nowiki></nowiki>) will return 2.
+
So '''WidePos'''(<nowiki>'</nowiki>car<nowiki>'</nowiki>, <nowiki>'</nowiki>s'''car''' tissue<nowiki>'</nowiki>) will return 2.
  
If the substring is not present in string S the function will return 0.
+
If the substring is not present in the S string function will return 0.
  
<pre><nowiki>
+
<source>
 
procedure WideInsert(const Substr: WideString; var Dest: WideString; Index: Integer);
 
procedure WideInsert(const Substr: WideString; var Dest: WideString; Index: Integer);
</nowiki></pre>
+
</source>
  
WideInsert inserts given substring into Dest string starting from Index. So WideInsert(<nowiki></nowiki>not <nowiki></nowiki>, <nowiki></nowiki>it is my car<nowiki></nowiki>, 7) will change the Dest string into ''<nowiki></nowiki>it is not my car<nowiki></nowiki>''.
+
'''WideInsert''' inserts given substring into Dest string starting from Index. So '''WideInsert'''(<nowiki>'</nowiki>not <nowiki>'</nowiki>, <nowiki>'</nowiki>it is my car<nowiki>'</nowiki>, 7) will change the Dest string into <nowiki>'</nowiki>''it is not my car''<nowiki>'</nowiki>.
  
<pre><nowiki>
+
<source>
 
procedure WideDelete(var S: WideString; Index, Count: Integer);
 
procedure WideDelete(var S: WideString; Index, Count: Integer);
</nowiki></pre>
+
</source>
  
WideDelete deletes Count number of chars from S string starting at Index. So WideDelete(<nowiki></nowiki>it is not my car<nowiki></nowiki>, 7, 4) will change back the S string into ''<nowiki></nowiki>it is my car<nowiki></nowiki>''.
+
'''WideDelete''' deletes Count number of chars from S string starting at Index. So '''WideDelete'''(<nowiki>'</nowiki>it is not my car<nowiki>'</nowiki>, 7, 4) will change back the S string into <nowiki>'</nowiki>''it is my car''<nowiki>'</nowiki>.
  
Armed with that knowledge we can write a script that will find ''<nowiki></nowiki>your car<nowiki></nowiki>'' phrase and will check if there is a word ''<nowiki></nowiki>not<nowiki></nowiki>'' before it (no matter where exactly, but between beginning of the filename and the phrase). And only if there is no such word, it will replace ''<nowiki></nowiki>your<nowiki></nowiki>'' with ''<nowiki></nowiki>my<nowiki></nowiki>''.
+
Armed with that knowledge we can write a script that will find <nowiki>'</nowiki>''your car''<nowiki>'</nowiki> phrase and will check if there is a word <nowiki>'</nowiki>''not''<nowiki>'</nowiki> before it (no matter where exactly, but between beginning of the filename and the phrase). And only if there is no such word, it will replace <nowiki>'</nowiki>''your''<nowiki>'</nowiki> with <nowiki>'</nowiki>''my''<nowiki>'</nowiki>.
  
In opposition to the WideReplaceStr function this script will find only first appearance of searched phrase. If we would like to check all appearances, we would have to put this code into some fancy loop.
+
====Full control over Find & Replace operation====
 +
In opposition to the '''WideReplaceStr''' function this script will find only the first appearance of searched phrase. If we would like to check all appearances, we would have to put this code into some fancy loop.
  
<pre><nowiki>
+
<source>
 
var
 
var
   Car, Not_Word : Integer;
+
   Car_Index, Not_Index : Integer;
 
begin
 
begin
   Car := WidePos('your car', WideLowerCase(FileName));
+
   Car_Index := WidePos('your car', WideLowerCase(FileName));
   Not_Word := WidePos('not ', WideLowerCase(FileName));
+
   Not_Index := WidePos('not ', WideLowerCase(FileName));
   if Car > 0 then  
+
   if Car_Index > 0 then  
     if (Not_Word > 0) and (Not_Word < Car) then
+
     if (Not_Index > 0) and (Not_Index < Car_Index) then
 
       begin
 
       begin
         WideDelete(FileName, Car, Length('your'));
+
         WideDelete(FileName, Car_Index, Length('your'));
         WideInsert('my', FileName, Car);
+
         WideInsert('my', FileName, Car_Index);
 
       end;
 
       end;
 
end.
 
end.
</nowiki></pre>
+
</source>
  
I guess you<nowiki>’</nowiki>re curious why we did search ''<nowiki></nowiki>your car<nowiki></nowiki>'' and ''<nowiki></nowiki>not <nowiki></nowiki>'' phrases in lowercased filename (WideLowerCase(FileName)). We did that because WidePos function is case sensitive. Please pay attention that we didn<nowiki>’</nowiki>t change the actual case of the filename. We just passed the copy of lowercased filename string into WidePos function. This ensures that any variant of case will be found as all of them (eg. ''<nowiki></nowiki>Your Car<nowiki></nowiki>'', ''<nowiki></nowiki>YoUR caR<nowiki></nowiki>'') are identical to ''<nowiki></nowiki>your car<nowiki></nowiki>'' after lowercasing.
+
I guess you<nowiki>’</nowiki>re curious why we did search <nowiki>'</nowiki>''your car''<nowiki>'</nowiki> and <nowiki>'</nowiki>''not ''<nowiki>'</nowiki> phrases in lowercased FileName (WideLowerCase(FileName)). We did that because '''WidePos''' function is case sensitive. Please pay attention that we didn<nowiki>’</nowiki>t change the actual case of the FileName. We just passed the copy of lowercased FileName string into '''WidePos''' function. This ensures that any variant of case will be found as all of them (eg. <nowiki>'</nowiki>''Your Car''<nowiki>'</nowiki>, <nowiki>'</nowiki>''YoUR caR''<nowiki>'</nowiki>) are identical to <nowiki>'</nowiki>''your car''<nowiki>'</nowiki> after lowercasing.
  
And finally last, but not least, in this chapter will be presented WideCopy function. Let<nowiki>’</nowiki>s take a look on it<nowiki>’</nowiki>s declaration:
+
====WideCopy function====
 +
And finally last, but not least, in this chapter will be presented '''WideCopy''' function. Let<nowiki>’</nowiki>s take a look on it<nowiki>’</nowiki>s declaration:
  
<pre><nowiki>
+
<source>
 
function WideCopy(const S: WideString; Index, Count: Integer): WideString;
 
function WideCopy(const S: WideString; Index, Count: Integer): WideString;
</nowiki></pre>
+
</source>
  
 
WideCopy will return a substring of string S that starts on Index and has numbers of chars defined by Count parameter.
 
WideCopy will return a substring of string S that starts on Index and has numbers of chars defined by Count parameter.
Line 97: Line 104:
 
This means that '''WideCopy'''(<nowiki>’</nowiki>sit down<nowiki>’</nowiki>; 5, 4) will return ''<nowiki>’</nowiki>down<nowiki>’</nowiki>'' (4 letters starting from index 5).
 
This means that '''WideCopy'''(<nowiki>’</nowiki>sit down<nowiki>’</nowiki>; 5, 4) will return ''<nowiki>’</nowiki>down<nowiki>’</nowiki>'' (4 letters starting from index 5).
  
 +
====Making first letter capital====
 +
'''WideCopy''' function will let us capitalize only the first letter of the filename.
  
This function will let us capitalize only first letter of the filename.
+
<source>
 
 
<pre><nowiki>
 
 
FileName := WideUpperCase(FileName[1]) + WideLowerCase(WideCopy(FileName, 2, Length(FileName)-1));
 
FileName := WideUpperCase(FileName[1]) + WideLowerCase(WideCopy(FileName, 2, Length(FileName)-1));
</nowiki></pre>
+
</source>
  
We are building the FileName from two blocks: first is the first letter of FileName changed to uppercase and second – is the rest of the FileName made lowercase. We use ''WideCopy(FileName, 2, Length(FileName)-1)'' statement to get everything from the second letter till the end of the filename.
+
We are building the FileName from two parts: first goes uppercased first letter of the FileName and then lowercased rest of the FileName. We use '''WideCopy'''(FileName, 2, Length(FileName) - 1) statement to get everything from the second letter till the end of the FileName.

Revision as of 15:58, 13 August 2009

{{{iparam}}} This article needs to be cleaned up!

Unicode String Handling Routines or How to operate on words

Swapping parts of the FileName

What if we have mp3 files of certain format, eg. "author – title.mp3" and we want to rename them into "title - author.mp3"? We need to split filename in some certain place (on " - ") and then use created parts to build a new filename. We can achieve that with WideSplitString function that takes Input (a string to split) and Delimiter paramethers and returns an array of strings (TStringsArray type). If the Input is "Queen - Bohemian Rhapsody" and a Delimiter is " - " it will produce an array ["Queen", "Bohemian Rhapsody"].

Please pay attention that TStringsArray type arrays are zero-based, which means the index of the first element is 0. So we will get array[0] = "Queen" and array[1] = "Bohemian Rhapsody".

The whole operation can be achieved with such a piece of code.

To understand the code below you'll need basic knowledge about variables declaration, arrays and if-then-else statement.

var
  SplittedFileName: TStringsArray;
begin
  SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' - ');
  if Length(SplittedFileName) = 2 then
    FileName := SplittedFileName[1] + ' - ' + SplittedFileName[0] + WideExtractFileExt(FileName);
end.

The script will produce "Bohemian Rhapsody – Queen.mp3" from "Queen – Bohemian Rhapsody.mp3".

We are checking the length of the array SplittedFileName to ensure that we won't go out of the array bounds. This would happen if we would have a file of a different format in the files table, eg. "Bohemian Rhapsody (Queen)").

Splitting the FileName into words

If we would like to split the FileName into words (word in this case is anything that lays between two spaces) the proper line of code would look like this:

SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' ');

Replacing parts of the FileName

Another useful function is WideReplaceStr function. With its help we can eg. replace all appearances of 'your car' phrase with 'my car'.

FileName := WideReplaceStr(FileName, 'your car', 'my car');

It will also change 'not your car' into 'not my car' and if we are really possesive and egoistic we might not like that...

WidePos, WideInsert and WideDelete functions

To solve the problem we will need few others string handling functions and procedures: WidePos, WideInsert and WideDelete. If you’re sure you won’t process any unicode characters, you may use Pos, Insert and Delete functions/procedures instead.

Before we start to describe them you need to know that strings in Pascal are represented as 1-based arrays of chars which means that the first index of string is 1 (so FileName[0] gives 'out of bounds error').

Now we can take a look at the description of functions/procedures that were mentioned above.

function WidePos(const SubStr, S: WideString): Integer;

WidePos finds a substring in given string S and returns the position of its first char.

So WidePos('car', 'scar tissue') will return 2.

If the substring is not present in the S string function will return 0.

procedure WideInsert(const Substr: WideString; var Dest: WideString; Index: Integer);

WideInsert inserts given substring into Dest string starting from Index. So WideInsert('not ', 'it is my car', 7) will change the Dest string into 'it is not my car'.

procedure WideDelete(var S: WideString; Index, Count: Integer);

WideDelete deletes Count number of chars from S string starting at Index. So WideDelete('it is not my car', 7, 4) will change back the S string into 'it is my car'.

Armed with that knowledge we can write a script that will find 'your car' phrase and will check if there is a word 'not' before it (no matter where exactly, but between beginning of the filename and the phrase). And only if there is no such word, it will replace 'your' with 'my'.

Full control over Find & Replace operation

In opposition to the WideReplaceStr function this script will find only the first appearance of searched phrase. If we would like to check all appearances, we would have to put this code into some fancy loop.

var
  Car_Index, Not_Index : Integer;
begin
  Car_Index := WidePos('your car', WideLowerCase(FileName));
  Not_Index := WidePos('not ', WideLowerCase(FileName));
  if Car_Index > 0 then 
    if (Not_Index > 0) and (Not_Index < Car_Index) then
      begin
        WideDelete(FileName, Car_Index, Length('your'));
        WideInsert('my', FileName, Car_Index);
      end;
end.

I guess you’re curious why we did search 'your car' and 'not ' phrases in lowercased FileName (WideLowerCase(FileName)). We did that because WidePos function is case sensitive. Please pay attention that we didn’t change the actual case of the FileName. We just passed the copy of lowercased FileName string into WidePos function. This ensures that any variant of case will be found as all of them (eg. 'Your Car', 'YoUR caR') are identical to 'your car' after lowercasing.

WideCopy function

And finally last, but not least, in this chapter will be presented WideCopy function. Let’s take a look on it’s declaration:

function WideCopy(const S: WideString; Index, Count: Integer): WideString;

WideCopy will return a substring of string S that starts on Index and has numbers of chars defined by Count parameter.

This means that WideCopy(’sit down’; 5, 4) will return ’down’ (4 letters starting from index 5).

Making first letter capital

WideCopy function will let us capitalize only the first letter of the filename.

FileName := WideUpperCase(FileName[1]) + WideLowerCase(WideCopy(FileName, 2, Length(FileName)-1));

We are building the FileName from two parts: first goes uppercased first letter of the FileName and then lowercased rest of the FileName. We use WideCopy(FileName, 2, Length(FileName) - 1) statement to get everything from the second letter till the end of the FileName.