#11 2009-01-11 04:27

eR@SeR
Senior Member
From: Земун, Србија
Registered: 2008-01-23
Posts: 353

Re: TrID library for detecting file extension

Andrew wrote:

...TrID will show only those extensions that it identifies as being possibly correct, with the greatest advantage being that it will moreover display them from left to right in decreasing order of probability! Cool, eh?

COOL but there is something wrong with your script I think. I tested with mine eminent 82 video files and get:
avi ---> avi/,    wmv ---> wmv/cat  or  wmv ---> wma/wmv/cat,   flv ---> cel.
Also  xls ---> xls/,  jpg ---> jpg/mp3,  mp3 ---> (too much variable detections).
Seems when you added slash as separator, displays / at the end of already correct extensions (not all) and adds some non existing detections that TrID Script v3 don't have (like cat, cel...). Check it yourself, correct it and your script will be masterpiece big_smile
Btw can you make (with corrected script) mpg/mpeg to be only mpg so we don't have to manually edit the detection?

Last edited by eR@SeR (2009-01-11 18:37)


TRUTH, FREEDOM, JUSTICE and FATHERLAND are the highest morale values which human is born, lives and dies for!

Offline

#12 2009-01-11 09:41

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

The extra "/" at the end is a trivial issue and I've already taken care of it. As for the rest, well let me tell you how TrID works. If I run it on a video file I have from the command-line for example, I get this:

36.5% (.M4V) MPEG-4 Video (70005/3/23)
32.8% (.M4R) iPhone Ringtone (63004/2/19)
25.3% (.3G2) 3GPP2 multimedia audio/video (48504/2/20)
 2.6% (.MOV) QuickTime Movie (5000/1/2)
 1.5% (.MP4) Generic MP4 container (3004/2)
 0.5% (.) MacBinary 2 header (1002/3)
 0.5% (.ABR) Adobe PhotoShop Brush (1002/3)

As you can see, it ranks extensions in decreasing order of probability, according to the amount of matches it gets for magic bytes, strings etc. Denis' earlier script was displaying only the top-most extension, i.e. the 1st one returned with max. probability attached. Since you requested that all possible extensions should be shown and the user given a choice, I modified the script accordingly. Of course, now it shows all extensions even down to absurd ones with very low probability.

I had already pondered this issue and as a compromise between the two extreme approaches (show only 1 vs show all), I thought that I should keep a cut-off, say 10 or 15%. So in that case I would display only extensions with probability >= 10 or 15%. But I faced some issues with that code and have asked for Marco's help. Meanwhile I released the previous script with no cut-off defined. As soon as Marco replies, I will release the next version.

The "mpg/mpeg" issue can be solved in 2 ways. The better/official way will be to:

1) Edit the XML for that file type (video-mpeg.trid.xml) and change the contents of the <Ext>MPG/MPEG</Ext> tag
2) Create a new TrIDDefs.trd file for personal use using TrIDDefsPack.exe, or send the updated XML to Marco so he can include it in the definitions library

The quick-fix (and IMO wrong) way will be to create a special exception in the script. I believe that the official definition should remain "mpg/mpeg" (allow the user to decide), but for you with the next script I can provide a single line of code that will solve your problem. smile

Last but not least, let me say that when we improve TrID's library, then the highest probabilities will rise to the top and override the others. So that's the reason why I plan to keep submitting new definitions. Also, the more files we run it on, the better a definition will be naturally. For example, not all AVIs or MP4s are exactly alike, since AVI and MP4 are just containers and the internal contents can vary greatly depending on codecs etc. Thus the more AVIs or MP4s we scan with TrIDScan.exe, the better the resulting definitions will be for those file types.

Last edited by Andrew (2009-01-14 08:11)

Offline

#13 2009-01-12 07:06

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

Got Marco's reply, so here's the new script:

{ TrID Script v5 }

// v1 - Original script by Denis
// v2 - Modified by Andrew to output lowercase extensions and detect presence of TrIDLib.dll and TrIDDefs.trd
// v3 - Modified by Andrew to keep existing extensions if not identified properly
// v4 - Modified by Andrew to display multiple possible extensions
// v5 - Modified by Andrew to display multiple possible extensions with cutoff of 20%

// TrID DLL exported functions
function TridAnalyze: Integer;
  external 'TrID_Analyze@TrIDLib.dll stdcall';
function TridSubmitFileA(szFileName: PChar): Integer;
  external 'TrID_SubmitFileA@TrIDLib.dll stdcall';
function TridLoadDefsPack(szPath: PChar): Integer;
  external 'TrID_LoadDefsPack@TrIDLib.dll stdcall';
function TridGetInfo(lInfoType: Integer; lInfoIdx: Integer; sBuf: PChar): Integer;
  external 'TrID_GetInfo@TrIDLib.dll stdcall';

// Constants for TrID_GetInfo
const
  TRID_GET_RES_NUM     = 1; // Get the number of results available
  TRID_GET_RES_FILEEXT = 3; // Filetype extension
  TRID_GET_RES_POINTS  = 4; // Matching points

function GetExtensions(): String;
var
  Str1, Str2: String;
  I, NumRes, NumPts, TotPts: Integer;

begin
  Result := '';
  SetLength(Str1, 100);
  NumRes := TridGetInfo(TRID_GET_RES_NUM, 0, PChar(Str1));
  if (NumRes > 0) then
  begin
    for I := 1 to NumRes do
      TotPts := TotPts + TridGetInfo(TRID_GET_RES_POINTS, I, PChar(Str1));
    for I := 1 to NumRes do
    begin
      NumPts := TridGetInfo(TRID_GET_RES_POINTS, I, PChar(Str1));
      if ((NumPts*100/TotPts) > 20) then
      begin
        TridGetInfo(TRID_GET_RES_FILEEXT, I, PChar(Str1));
        Str2 := LowerCase(String(PChar(Str1)));
        if (Length(Str2)>0) And (Pos(Str2, Result)=0) then
        begin
          if (I > 1) then Result := Result + '/';
          Result := Result + Str2;
        end;
      end;
    end;
  end;
end;

var
  Exts: String;
  Initialized, FileErr: Boolean;

begin
  if (not Initialized) then
  begin
    if (not WideFileExists('TrIDLib.dll')) Or (WideFileSize('TrIDLib.dll')=0)
    Or (not WideFileExists('TrIDDefs.trd')) Or (WideFileSize('TrIDDefs.trd')=0) then
    begin
      if (DialogYesNo('Error! TrIDLib.dll and/or TrIDDefs.trd not found in the program directory!'
      + #13#10#13#10 + 'Do you want to download the latest versions from the TrID website now?')) then
        ShellOpenFile('http://mark0.net/soft-trid-e.html');
      FileErr := True;
    end;
    TridLoadDefsPack('');
    Initialized := True;
  end;
  if (FileErr) then Exit;
  TridSubmitFileA(PChar(FilePath));
  if (TridAnalyze <> 0) then
  begin
    Exts := GetExtensions();
    if (Length(Exts) > 0) then FileName := WideExtractBaseName(FileName) + '.' + Exts;
  end;
end.

The cutoff is kept at 20%, but it's extremely easy to change.

P.S. Additional fix for eR@SeR smile:

Change this:

begin
  if (I > 1) then Result := Result + '/';
  Result := Result + Str2;
end;

to this:

begin
  if (I > 1) then Result := Result + '/';
  if (Str2 = 'mpg/mpeg') then Str2 := 'mpg'; // Add this line!
  Result := Result + Str2;
end;

Last edited by Andrew (2009-01-25 17:31)

Offline

#14 2009-01-14 01:05

eR@SeR
Senior Member
From: Земун, Србија
Registered: 2008-01-23
Posts: 353

Re: TrID library for detecting file extension

Andrew wrote:

Denis' earlier script was displaying only the top-most extension, i.e. the 1st one returned with max. probability attached. Since you requested that all possible extensions should be shown and the user given a choice, I modified the script accordingly. Of course, now it shows all extensions even down to absurd ones with very low probability.
.
.
...but for you with the next script I can provide a single line of code that will solve your problem.
.
.
P.S. Additional fix for eR@SeR smile:

Nice work around. For now, everything seems OK. If I notice something unclear or have any questions I'll get you notice. Thanks for mpg script line wink

Andrew wrote:

Last but not least, let me say that when we improve TrID's library, then the highest probabilities will rise to the top and override the others. So that's the reason why I plan to keep submitting new definitions. Also, the more files we run it on, the better a definition will be naturally.

Once again thank you for your effort and persistence in creating these scripts. Keep going to achieve better detectons for TrID database and scripts too if possible off course! smile

Last edited by eR@SeR (2009-01-14 12:33)


TRUTH, FREEDOM, JUSTICE and FATHERLAND are the highest morale values which human is born, lives and dies for!

Offline

#15 2009-01-14 08:19

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

Well, creating/modifying scripts is useful, fun and a good learning experience as well! smile Others on this forum (including you) have RegEx covered very well (thanks for that!), so I'm trying to be helpful with scripts in my own small way...

Offline

#16 2009-01-28 02:12

LuisLlamas
Member
Registered: 2009-01-28
Posts: 3

Re: TrID library for detecting file extension

There is a simply way to solve the "mpg/mpeg" or "wma/wmv" issue. You only have to add a new rule for delete, for example, /wmv, or mpg/, or everthing after the / symbol.

Offline

#17 2009-01-28 16:52

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

Yes, that should work for eR@SeR (and others) as well. Anyway, as I pointed out above, I don't think it should be applied in all situations and thus I presented it as a separate addition to the script for only those who really want it.

Offline

#18 2011-11-26 11:31

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

Latest iteration of the script, as per the discussion here:

[Script deleted, see below for latest version.]

Last edited by Andrew (2011-11-26 22:56)

Offline

#19 2011-11-26 19:09

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,374

Re: TrID library for detecting file extension

@Andrew

1) I would suggest you to use:

Path := WideExtractFilePath(GetApplicationPath());

Instead of:

Path := WideReplaceText(GetApplicationPath(), 'ReNamer.exe', '');

2) Also, that line should go into the initialization section instead of being called for every processed files, unless it is used anywhere else.

3) You can also remove any checking against "TrIDLib.dll" library (redundant). Your script will never actually execute if the DLL is not found.

Offline

#20 2011-11-26 23:41

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

Thanks for the comments, Denis! Have appended the latest version of the script below. I had some questions though:

1. TridLoadDefsPack() creates a 0-byte TrIDDefs.trd if the file doesn't exist already in the proper folder. Why is this? Is it the fault of the library itself, or something to do with the script?

2) You've mentioned on the Wiki page for this script that "At the moment works only on ANSI filenames (non Unicode)." Why is this? I see that in Delphi we have Char, String (both ANSI), WideChar, WideString (both Unicode), PChar, PString (pointers to ANSI strings), PWideChar and PWideString (pointers to Unicode strings). But in PascalScript (according to the Types Wiki page) we only have Char, String, WideChar, WideString and PChar. Is the lack of Unicode support a fault of the library itself, or something to do with PascalScript not supporting PString, PWideChar and PWideString? Also, any reason why these are missing from PascalScript?

-----

{ TrID Script v6 }

// v1 - Original script by Denis
// v2 - Modified by Andrew to output lowercase extensions and detect presence of TrIDLib.dll and TrIDDefs.trd
// v3 - Modified by Andrew to keep existing extensions if not identified properly
// v4 - Modified by Andrew to display multiple possible extensions
// v5 - Modified by Andrew to display multiple possible extensions with cutoff of 20%
// v6 - Modified by Andrew to fix a path-related issue

// Important: Download http://mark0.net/download/tridlib-free.zip and http://mark0.net/download/triddefs.zip,
// then extract TrIDLib.dll and TrIDDefs.trd to ReNamer.exe's folder/directory or else the script will fail

// TrID DLL exported functions
function TridLoadDefsPack(szPath: PChar): Integer;
  external 'TrID_LoadDefsPack@TrIDLib.dll stdcall';
function TridSubmitFileA(szFileName: PChar): Integer;
  external 'TrID_SubmitFileA@TrIDLib.dll stdcall';
function TridAnalyze: Integer;
  external 'TrID_Analyze@TrIDLib.dll stdcall';
function TridGetInfo(lInfoType: Integer; lInfoIdx: Integer; sBuf: PChar): Integer;
  external 'TrID_GetInfo@TrIDLib.dll stdcall';

// Constants for TrID_GetInfo etc.
const
  TRID_GET_RES_NUM     = 1;  // Get the number of results
  TRID_GET_RES_FILEEXT = 3;  // Filetype extension
  TRID_GET_RES_POINTS  = 4;  // Matching points
  TRID_GET_RES_CUTOFF  = 20; // Cutoff percentage for results

function GetExtensions(): WideString;
var
  Str1, Str2: String;
  I, NumRes, NumPts, TotPts: Integer;

begin
  Result := '';
  SetLength(Str1, 100);
  NumRes := TridGetInfo(TRID_GET_RES_NUM, 0, PChar(Str1));
  if (NumRes > 0) then
  begin
    for I := 1 to NumRes do
      TotPts := TotPts + TridGetInfo(TRID_GET_RES_POINTS, I, PChar(Str1));
    for I := 1 to NumRes do
    begin
      NumPts := TridGetInfo(TRID_GET_RES_POINTS, I, PChar(Str1));
      if ((NumPts*100/TotPts) > TRID_GET_RES_CUTOFF) then
      begin
        TridGetInfo(TRID_GET_RES_FILEEXT, I, PChar(Str1));
        Str2 := LowerCase(String(PChar(Str1)));
        if (Length(Str2)>0) And (Pos(Str2, Result)=0) then
        begin
          if (I > 1) then Result := Result + '/';
          Result := Result + Str2;
        end;
      end;
    end;
  end;
end;

var
  Initialized: Boolean;
  AppPath, FileExts: WideString;

begin
  if (not Initialized) then
  begin
    Initialized := True;
    AppPath := WideExtractFilePath(GetApplicationPath());
    if (TridLoadDefsPack(AppPath) = 0) then
    begin
      WideDeleteFile(AppPath + 'TrIDDefs.trd');
      if (DialogYesNo('Error! TrIDDefs.trd not found in the program directory (' + AppPath + ')!'
      + #13#10#13#10 + 'Do you want to download the latest version from the TrID website now?')) then
        ShellOpenFile('http://mark0.net/soft-trid-e.html');
      Exit;
    end;
  end;
  if (TridSubmitFileA(PChar(FilePath)) <> 0) then
  begin
    if (TridAnalyze <> 0) then
    begin
      FileExts := GetExtensions();
      if (Length(FileExts) > 0) then
        FileName := WideStripExtension(FileName) + '.' + FileExts;
    end;
  end;
end.

-----

EDIT: Changed the function used for stripping the extension to WideStripExtension. Also, published on the wiki: wiki/ReNamer:Scripts:TrID

Last edited by Andrew (2011-11-27 14:29)

Offline

Board footer

Powered by FluxBB