#1 2008-12-16 18:24

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,367

TrID library for detecting file extension

After reading Andrew's post in topic Update for binary signature, I have checked the TrID library and found it to be very powerful in detecting the correct extension of files. Below is the script that integrates TrIDLib.dll library into ReNamer.

Requirements:
1) Latest dev version of ReNamer (tested with Beta from 5 Dec 2008)
2) TrIDLib.dll in ReNamer's folder (http://mark0.net/code-tridlib-e.html)
3) TrIDDefs.TRD in ReNamer's folder (http://mark0.net/soft-trid.html)

{ TrID Script }

// TrID DLL exported functions
function TridAnalyze: integer;
  external 'TrID_Analyze@TrIDLib.dll stdcall';
function TridSubmitFileA(szFileName: pchar): integer;
  external 'TrID_SubmitFileA@tridlib.dll stdcall';
function TridLoadDefsPack(szPath: pchar): integer;
  external 'TrID_LoadDefsPack@tridlib.dll stdcall';
function TridGetInfo(lInfoType: integer; lInfoIdx: integer; sBuf: pchar): integer;
  external 'TrID_GetInfo@tridlib.dll stdcall';

// Constants for TrID_GetInfo
const
  TRID_GET_RES_NUM         = 1;    // Get the number of results available
  TRID_GET_RES_FILETYPE    = 2;    // Filetype descriptions
  TRID_GET_RES_FILEEXT     = 3;    // Filetype extension
  TRID_GET_RES_POINTS      = 4;    // Matching points
  TRID_GET_VER             = 1001; // TrIDLib version (major * 100 + minor)
  TRID_GET_DEFSNUM         = 1004; // Number of filetypes definitions loaded

function GetTopExtensions(Max: Integer): String;
var
  S: String;
  Num, I: Integer;
begin
  Result := '';
  SetLength(S, 100);
  Num := TridGetInfo(TRID_GET_RES_NUM, 0, PChar(S));
  if Num > 0 then
  begin
    if Num > Max then
      Num := Max;
    for I:=1 to Num do
      begin
        TridGetInfo(TRID_GET_RES_FILEEXT, I, PChar(S));
        if I > 1 then Result := Result + '|';       
        Result := Result + String(PChar(S));
      end;
  end;
end;

var
  Initialized: Boolean;

begin
  if not Initialized then
  begin
    TridLoadDefsPack('');
    Initialized := True;
  end;
  TridSubmitFileA(PChar(FilePath));
  if TridAnalyze <> 0 then
    FileName := WideExtractBaseName(FileName)+'.'+GetTopExtensions(1);
end.

Offline

#2 2008-12-17 16:45

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

Now this is simply awesome, Denis! big_smile

I'm going to check it out right now... Also, I hope the script will be included henceforth in the ReNamer package itself as part of the Scripts directory.

Offline

#3 2008-12-17 18:33

eR@SeR
Senior Member
From: Земун, Србија
Registered: 2008-01-23
Posts: 353

Re: TrID library for detecting file extension

Now this is simply awesome...

I tought that it would be too, but after some testings I decided to use Detect using binary signature (Dubs)...

Here are mine reasons (TrID):

- all generated extensions are un UPPER CASE even for not changed one
- mpg is detected like MPG/MPEG, it could be only mpg for ex.
- added 82 video files (asf, avi, mpg, wmv) to file table and it took more than 30sec to show preview, what will be for more files...
- most of wmv files are generated as wma, some are not changed
- don't generate xlsx (excel 2007)
- generate only unicode txt files, ANSI encoding isn't generated
- dll is generated as exe
- some jar games are generated as zip, also occured in Dubs
- 3gp is generated as 3g2 (Dubs don't generate it at all)
+ generate doc or xls individually (ppt is PPS/PPT)

Probably there is similar issues still waiting to be produced...

We should "invest" in Dubs because of these issues. That is mine opinion.

Edit: Removed...I didn't pay attention about Denis's word Requirements. Sorry...

Cheers wink

Last edited by eR@SeR (2008-12-18 00:35)


TRUTH, FREEDOM, JUSTICE and FATHERLAND are the highest morale values which human is born, lives and dies for!

Offline

#4 2008-12-18 05:19

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

I just checked it out and it works brilliantly! smile Just a few points though:

1) I personally prefer lowercase extensions but the script generates them in uppercase.

2) If the TrIDLib.dll file is not present, then no error is thrown and the new filename is simply the same as the old filename.

3)  If the TrIDDefs.trd file is not present, then a 0-byte file is created and the extensions returned by GetTopExtensions() are all empty, which will cause an invalid names error if the filenames are the same (Eg. File.txt and File.doc will both simply become File, which is erroneous, but File1.txt and File2.doc won't cause any such issues).

I modified the script a bit to deal with conditions 1), 2) and 3) as follows:

{ TrID Script }

// TrID DLL exported functions
function TridAnalyze: integer;
  external 'TrID_Analyze@TrIDLib.dll stdcall';
function TridSubmitFileA(szFileName: pchar): integer;
  external 'TrID_SubmitFileA@tridlib.dll stdcall';
function TridLoadDefsPack(szPath: pchar): integer;
  external 'TrID_LoadDefsPack@tridlib.dll stdcall';
function TridGetInfo(lInfoType: integer; lInfoIdx: integer; sBuf: pchar): integer;
  external 'TrID_GetInfo@tridlib.dll stdcall';

// Constants for TrID_GetInfo
const
  TRID_GET_RES_NUM         = 1;    // Get the number of results available
  TRID_GET_RES_FILETYPE    = 2;    // Filetype descriptions
  TRID_GET_RES_FILEEXT     = 3;    // Filetype extension
  TRID_GET_RES_POINTS      = 4;    // Matching points
  TRID_GET_VER             = 1001; // TrIDLib version (major * 100 + minor)
  TRID_GET_DEFSNUM         = 1004; // Number of filetypes definitions loaded

function GetTopExtensions(Max: Integer): String;
var
  S: String;
  Num, I: Integer;

begin
  Result := '';
  SetLength(S, 100);
  Num := TridGetInfo(TRID_GET_RES_NUM, 0, PChar(S));
  if Num > 0 then
  begin
    if Num > Max then
      Num := Max;
    for I:=1 to Num do
      begin
        TridGetInfo(TRID_GET_RES_FILEEXT, I, PChar(S));
        if I > 1 then Result := Result + '|';       
        Result := Result + WideLowerCase(String(PChar(S)));
      end;
  end;
end;

var
  Initialized, FileErr: Boolean;

begin
  if not Initialized then
  begin
    if (not WideFileExists('TrIDLib.dll')) Or (WideFileSize('TrIDLib.dll')=0)
      Or (not WideFileExists('TrIDDefs.trd')) Or (WideFileSize('TrIDDefs.trd')=0) then
    begin
      WideShowMessage('Error! Either TrIDLib.dll or TrIDDefs.trd do not exist in the program directory!');
      FileErr := True;
    end;
    TridLoadDefsPack('');
    Initialized := True;
  end;
  if FileErr = True then
    Exit;
  TridSubmitFileA(PChar(FilePath));
  if TridAnalyze <> 0 then
    FileName := WideExtractBaseName(FileName)+'.'+GetTopExtensions(1);
end.

4) I was also trying that instead of a plain error message, the script can direct the user to the web page from where the .DLL file is available. But something like
ExecuteProgram('http://www.den4b.com', false); or even
ExecuteProgram('iexplore http://www.den4b.com', false);
doesn't seem to work unfortunately (though the latter command works under Windows). So is there any way to launch a URL in the default system browser using PascalScript?

5) I'm going to look into some of eR@SeR's points and see what can be done about them. I obviously feel that this is a far better approach due to reasons outlined in the original post where I gave my suggestion to Denis. Since the TrID definitions file can be added to by users, I'm sure the misidentification problems can be dealt with easily.

-----

Edit: My query #4 has been solved with the addition of ShellOpenFile() as mentioned here: http://www.den4b.com/forum/viewtopic.php?id=551

Last edited by Andrew (2011-11-27 20:51)

Offline

#5 2008-12-18 09:03

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

Alrighty then, here's the rebuttal... wink

- all generated extensions are un UPPER CASE even for not changed one

Dealt with already.

- don't generate xlsx (excel 2007)

Huh? roll It does for me and is able to differentiate between doc/docx and xls/xlsx successfully.

- mpg is detected like MPG/MPEG, it could be only mpg for ex.

This is just a DOS vs. Windows/other OS issue. Some people might prefer MPEG, just like they prefer JPEG or HTML, whereas others might prefer MPG, JPG and HTM respectively. IMO, it's always better to show all options and give the user a choice.

However, this also points to a general problem with any filetype detection code, since often different filetypes may not really have magic bytes sufficiently different to distinguish between them. This problem is not only for TrID, but for ReNamer and I guess for any other program which does filetype identification.

For example, ReNamer identifies a file (originally Desk.cpl) as Desk.exe|com|dll|drv|sys|ocx|cpl|scr|vxd So how is that any better? If anything, TrID's definitions database, since it is user-submitted, is vastly superior to ReNamer's (I too submitted some defs to TrID just now). An example being that ReNamer identifies a file as File.doc|ppt|xls but TrID correctly identifies it as File.doc, or that ReNamer identifies .docx/.pptx as .zip (since they are compressed files), but TrID does better and identifies them as .docx/.pptx respectively.

The advantage of the TrID script above is obvious. We can change it to our liking to deal with any issues like this. As far as I can see, we have two solutions:

1) Automatically select any one extension from the several suggested. This sounds silly to me but some may like this as it will be less irritating and won't require user intervention.

2) Ask the user to select an extension from the suggestion list for every doubtful case. Obviously somewhat irritating and not automatic, but again some may prefer this.

The script can be modified for whatever is the preferred solution, else you can suggest something as well. My personal preference is to keep the script as-is and do as Denis did, that is just display the extension as exe|com|dll|drv|sys|ocx|cpl|scr|vxd and let the user decide.

- most of wmv files are generated as wma, some are not changed
- dll is generated as exe
- some jar games are generated as zip, also occured in Dubs
- 3gp is generated as 3g2 (Dubs don't generate it at all)

In case of incorrect extension detection (rather than multiple extension detection), we can do the hard work ourselves and add to TrID's database immediately (and also send the new/refined def to Marco for inclusion in the next version of the DB), but in ReNamer's case we need to contact Denis, provide him with the magic bytes or the problem files, then wait for him to add to the program and release a new version. Which do you think makes more sense?

- generate only unicode txt files, ANSI encoding isn't generated

I didn't really get this. Can you provide some examples?

- added 82 video files (asf, avi, mpg, wmv) to file table and it took more than 30sec to show preview, what will be for more files...

I have several observations to make here:

a) You're talking about an internal compiled function (so-called 'dubs') vs. an external Pascal script, that is interpreted at run-time, which accesses an external .DLL's functions, which further access an external (huge) definitions file... Any guesses on which will be faster?

b) I have already noted that I think filetype detection is not ReNamer's job. If ReNamer also were to identify internally as many files as TrID with as much accuracy (TrID uses many more checks like unique strings etc.), I'm sure it would be only slightly faster compared to TrID, unlike the huge difference that exists now.

c) Finally, I don't think speed is such an issue at all. After all, just how many times will this feature be used anyway? A little delay is certainly preferable when the performance is so much better.

We should "invest" in Dubs because of these issues. That is mine opinion.

Let me answer this with specific points:

a) Let me repeat, I don't think this is a job that ReNamer should do at all, primarily because it is not a simple task that will stop at just a few file types. There are literally millions of file formats out there (many with overlapping extensions) with new ones being created daily. ReNamer's core function is renaming and certainly not filetype detection.

b) Adding support for all those formats will be a huge drain on Denis' time and energy. Also, if we do a cost-benefit analysis, it will be a sheer waste as it will only re-invent the wheel and duplicate the efforts of TrID's author and of its many users, all for a feature that few will use often in ReNamer anyway.

c) The obvious advantage of an external definitions database is the ease of signature addition by users (less work for Denis!), plus the fact that adding thousands of signatures internally will only serve to make ReNamer unnecessarily bloated. Why should ReNamer's .EXE suddenly grow by over 1.5 MB in size (that's roughly the size of the TrID defs file)?

d) When we add defs to the TrID library (using TrIDScan), we not only save Denis' precious time, we also help thousands of other users who use either TrID, or apps that in turn use TrIDLib. Isn't that wonderful?

e) I never thought that TrID support would be integrated so soon into ReNamer. That's the power of PascalScript I guess! Also, the script approach is customisable, further extendable and also optional for those who don't care about such stuff. It keeps the main ReNamer .EXE small, which is great. If I had my way, I would actually remove the 'dubs' function and internal filetype signature DB from ReNamer completely, so that the .EXE became even smaller. But I have no issues with it being left in, though adding to it now would be useless as already mentioned above.

f) Finally, I always believe that one should go for experts in a field rather than jack-of-all-trades-master-of-none type people/programs. In case of the script, it's a perfect union between an expert filetype detection library and an expert renamer, so what could be better than getting the best of both worlds?

Let me end by saying that I'm not supporting TrID out of some false sense of attachment, just because I happened to bring it to Denis' attention. I have presented my reasons logically above. If you can show sufficient reason why the internal database should be preferred and why Denis should spend time expanding it, then we can use that with no issues. Also, I could have simply said that I'll use the script and you use 'dubs' and let's all be happy. But I didn't because "investing" in 'dubs' long-term will only require useless work for Denis and make ReNamer bloated, which IMO is obviously not the way to go.

Offline

#6 2009-01-03 06:28

eR@SeR
Senior Member
From: Земун, Србија
Registered: 2008-01-23
Posts: 353

Re: TrID library for detecting file extension

I apologize for such long delay...not have too much time... hmm

No doubt that TrID is great tool with so much detectable extensions etc. I just noticed some flaws not only for TrID's as you noticed and that's it. It's awesome and saves Denis's precious time and energy. I agree with that, now:

When I said:

We should "invest" in Dubs because of these issues.

I meant especially about these two (others are solvable):

- all generated extensions are un UPPER CASE even for not changed one

- added 82 video files (asf, avi, mpg, wmv)...

But, when you modified Denis's script to make extensions in lower case, one great problem for me has gone. I thought that TrID generates like that, obviously it's not the case big_smile

I'm still for submitting extensions to Denis to enrich its database which is modest, but I will use TrID only when Dubs fails to detect or get cases like exe|com|dll|drv|sys|ocx|cpl|scr|vxd or wma|wmv|asf... This option is rarely used for most of users, like you said, but I use almost every day, mostly for downloaded video and archive files. It happens that you have downloaded some file that have wrong extension. Here is mp3 song which is generated as exe etc. Almost unbelievable :S I cannot even convert it lol

- will Denis remove or even still develop Dubs or not, we don't know yet

- if Dubs remains and still be developed mine suggestion will be to add only missing "frequent" extensions (like SafetyCar's one for example) to fill the database and ReNamer's size will not be dramatically changed.

- if only Dubs could detect doc(x), xls(x) ppt(x) then wma, wmv, asf...separately that would be great achievement, until then TrID is solution

I'm for it that users decides which one will use? smile

Some (?) will use Dubs because of:

- part of ReNamer
- check it and forget it (easy setting)
- fast enough
- basic detection...

Many (?) will use TrID because of:

- its great advantages which you already mentioned
- troubles that have connected to Dubs...

And everyone will be satisfied big_smile

I just hope that Denis will add your modified pascal script into next release - that would be great. I guess that those two files (TrIDLib.dll, TrIDDefs.trd) will not be added due to their size sad But better anything than nothing... smile

Still exist wrong detections (like wmv --> wma, dll --> exe). I read on Marco's site how to submit/update extensions but I cannot run tridscan.exe. Maybe I do something wrong? hmm
Btw I found what was problem about xlsx detection - it was password protected. I oversighted that fact, so it is ok now big_smile

About txt detection:

- open Notepad, type any characters just non unicode ones, File/Save As "somename", under Encoding select ANSI, click Save button, rename or remove txt extension, add into ReNamer, use { TrID script } and you'll get "somename." Reproduced? Also occured with nfo and diz extensions (crack notes).

Just I don't know how to get those magic bytes like SafetyCar did? Any help would be appreciated wink

Andrew wrote:

4) I was also trying that instead of a plain error message, the script can direct the user to the web page from where the .DLL file is available...

That would be great smile

Andrew wrote:

Let me end by saying that I'm not supporting TrID out of some false sense of attachment, just because I happened to bring it to Denis' attention.

That goes without saying. Excellent, logical, detailed rebuttal Andrew big_smile

Best wishes in New Year wink


TRUTH, FREEDOM, JUSTICE and FATHERLAND are the highest morale values which human is born, lives and dies for!

Offline

#7 2009-01-03 14:36

SafetyCar
Senior Member
Registered: 2008-04-28
Posts: 446
Website

Re: TrID library for detecting file extension

eR@SeR wrote:

About txt detection:

- open Notepad, type any characters just non unicode ones, File/Save As "somename", under Encoding select ANSI, click Save button, rename or remove txt extension, add into ReNamer, use { TrID script } and you'll get "somename." Reproduced? Also occured with nfo and diz extensions (crack notes).

I think this is the most problematic thing of using it TrID, I think that when it doesn't know the extension, removes it... hmm

eR@SeR wrote:

Just I don't know how to get those magic bytes like SafetyCar did? Any help would be appreciated wink

Those are the first  numbers from the file red as hexadecimal, each file kind of extension use to have a diferent start.

I used used this program to see that that code. Free Hex Editor XVI32, here: http://www.chmaas.handshake.de/
I think there are much more of programs like this.

(By the way, the txt files do not have that special start code)


If this software has helped you, consider getting your pro version. :)

Offline

#8 2009-01-05 19:13

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

eR@SeR wrote:

I'm still for submitting extensions to Denis to enrich its database which is modest, but I will use TrID only when Dubs fails to detect or get cases like exe|com|dll|drv|sys|ocx|cpl|scr|vxd or wma|wmv|asf... This option is rarely used for most of users, like you said, but I use almost every day, mostly for downloaded video and archive files. It happens that you have downloaded some file that have wrong extension. Here is mp3 song which is generated as exe etc. Almost unbelievable :S I cannot even convert it lol

Hmm... Cannot download the file for some reason. Never happened with MediaFire before. hmm I would have liked to analyse the file myself to see why TrID misidentifies it as .EXE (that's very funny!) BTW, what does Dubs identify it as? MP3?

Also, let me tell you about this wonderful little portable utility I use to correctly identify the codecs required to play my downloaded videos. It's called GSpot (which is a funny name no doubt wink), but it is amazing! Get it from here and try it out. I guarantee you won't be disappointed. You can also check out AVIcodec (not just for AVIs) and VideoInspector.

eR@SeR wrote:

- if Dubs remains and still be developed mine suggestion will be to add only missing "frequent" extensions (like SafetyCar's one for example) to fill the database and ReNamer's size will not be dramatically changed.

No problem with this, since Denis will surely ensure that ReNamer doesn't become too bloated with unnecessary stuff (that's why he rightly rejects many otherwise good ideas, because they're not part of ReNamer's core functionality).

eR@SeR wrote:

I'm for it that users decides which one will use? smile

Yes, Dubs might be for the casual/occasional user whereas TrID would be for the more advanced user, especially since it is written in PascalScript.

eR@SeR wrote:

I just hope that Denis will add your modified pascal script into next release - that would be great. I guess that those two files (TrIDLib.dll, TrIDDefs.trd) will not be added due to their size sad But better anything than nothing... smile

Actually the TrID author will first have to be asked whether it is ok to redistribute his files. Plus it also raises questions about keeping the ReNamer package updated with the latest TrID files. At first I too thought that those files should be packaged, but then I changed my mind because it's unnecessary work for Denis to keep track of TrID changes and update the ReNamer package accordingly. Anyway for those advanced users who want it, the ReNamer manual should contain the URLs to the files and a note on how to use them, plus the actual script should be part of the package as well. That should be sufficient and those who're interested can then download the latest DLL and TRD versions and use them.

eR@SeR wrote:

Still exist wrong detections (like wmv --> wma, dll --> exe). I read on Marco's site how to submit/update extensions but I cannot run tridscan.exe. Maybe I do something wrong? hmm

Did you read the instructions on the following page - http://mark0.net/soft-tridscan-e.html? It's very nicely written with screenshots etc. If you're getting stuck somewhere, let me know and we can investigate further. I just followed the instructions and it was very easy. My new signatures are already part of the latest TRD package released in 2009! smile

Also re. the DLL/EXE issue (and other similar 'misidentification' issues with very similar file types), read what TrID's author wrote to me in response to my mail:

Marco Pontello wrote:

But when the filetype are so similar (they are all EXE, in the end, with a very similar structure), things are not always black or white.

Due to the nature of these files, don't expect always the topscore / perfect id.

At the moment, the results are to be taken just like best guesses: they will help in most of occasions, but it's not a 100% perfect system.

That's why Dubs also says exe|com|dll|drv|sys|ocx|cpl|scr|vxd (though TrID is a bit better here!), 'cos internally they're all very similar or even the same and just for application purposes have been given different extensions.

eR@SeR wrote:

About txt detection:
...
Just I don't know how to get those magic bytes like SafetyCar did? Any help would be appreciated wink

Well, let me quote SafetyCar here in response to your query:

SafetyCar wrote:

Those are the first  numbers from the file red as hexadecimal, each file kind of extension use to have a diferent start.

(By the way, the txt files do not have that special start code)

Like SafetyCar has pointed out, the so-called "magic bytes" are nothing but a common string of bytes shared by all files of a particular type. For example, suppose all MS Word Documents internally start with the string "MSWORDDOC". This may be for identification purposes, so that the program recognises which are its own files, and which are falsely named DOC files (since those probably will not have the string in the beginning). So what Dubs and TrID both do is look for that string "MSWORDDOC", which are the magic bytes for MS Word Documents. If it finds them, it says the extension is DOC, else it's not. Understood?

Now the problem with text files of any type (DIZ, NFO are also just renamed ASCII text files) is that they have no magic bytes! All text files may be different, so there is no common identification string. Actually, you've assumed wrongly what dubs is used for. Dubs = "Detect Using Binary Signature". So clearly, Dubs and TrID are meant to identify Binary files and not Text files. Hope that clears up any doubts you have. smile

eR@SeR wrote:

That goes without saying. Excellent, logical, detailed rebuttal Andrew big_smile

Heh. Thanks for understanding! tongue wink

SafetyCar wrote:

I think this is the most problematic thing of using it TrID, I think that when it doesn't know the extension, removes it... hmm

You're absolutely right, that is indeed irritating. But this once again perfectly proves just how great the PascalScript approach to call TrID used by Denis is! smile With PascalScript we can always change it to suit our needs, so here's the next version of the script, which will not delete the existing extension if it cannot identify it properly. Note that multiple identifications like MPG/MPEG will still be presented as before, since I feel the user should decide.

{ TrID Script v3 (Originally by Denis with minor changes by Andrew) }

// TrID DLL exported functions
function TridAnalyze: integer;
  external 'TrID_Analyze@TrIDLib.dll stdcall';
function TridSubmitFileA(szFileName: pchar): integer;
  external 'TrID_SubmitFileA@tridlib.dll stdcall';
function TridLoadDefsPack(szPath: pchar): integer;
  external 'TrID_LoadDefsPack@tridlib.dll stdcall';
function TridGetInfo(lInfoType: integer; lInfoIdx: integer; sBuf: pchar): integer;
  external 'TrID_GetInfo@tridlib.dll stdcall';

// Constants for TrID_GetInfo
const
  TRID_GET_RES_NUM      = 1;    // Get the number of results available
  TRID_GET_RES_FILETYPE = 2;    // Filetype descriptions
  TRID_GET_RES_FILEEXT  = 3;    // Filetype extension
  TRID_GET_RES_POINTS   = 4;    // Matching points
  TRID_GET_VER          = 1001; // TrIDLib version (major * 100 + minor)
  TRID_GET_DEFSNUM      = 1004; // Number of filetypes definitions loaded

function GetTopExtensions(Max: Integer): String;
var
  S: String;
  Num, I: Integer;

begin
  Result := '';
  SetLength(S, 100);
  Num := TridGetInfo(TRID_GET_RES_NUM, 0, PChar(S));
  if Num > 0 then
  begin
    if Num > Max then
      Num := Max;
    for I := 1 to Num do
      begin
        TridGetInfo(TRID_GET_RES_FILEEXT, I, PChar(S));
        if I > 1 then Result := Result + '|';       
        Result := Result + WideLowerCase(String(PChar(S)));
      end;
  end;
end;

var
  Initialized, FileErr: Boolean;

begin
  if not Initialized then
  begin
    if (not WideFileExists('TrIDLib.dll')) Or (WideFileSize('TrIDLib.dll')=0)
      Or (not WideFileExists('TrIDDefs.trd')) Or (WideFileSize('TrIDDefs.trd')=0) then
    begin
      WideShowMessage('Error! Either TrIDLib.dll or TrIDDefs.trd do not exist in the program directory!');
      FileErr := True;
    end;
    TridLoadDefsPack('');
    Initialized := True;
  end;
  if FileErr then
    Exit;
  TridSubmitFileA(PChar(FilePath));
  if (TridAnalyze<>0) And (GetTopExtensions(1)<>'') then
    FileName := WideExtractBaseName(FileName) + '.' + GetTopExtensions(1);
end.
eR@SeR wrote:

Best wishes in New Year wink

Wish Denis and all of you and your families a very Happy New Year! smile
__________________________________________________________________________

Edit: Ok, I was able to finally download that 'MP3' file and check it out with a hex editor. What a joke it turned out to be! lol

The file starts with "MZ", which implies an EXE file, but contains a "LAME3.93" tag towards the end, which implies an MP3! Seems some joker/idiot has appended an MP3 to an EXE, then renamed the resulting combined EXE+MP3 file to MP3. It is a little known fact that MP3 players like Winamp etc. are quite tolerant of corrupted data (since lots of MP3 get corrupted to some extent while downloading). So Winamp just skips the initial EXE part of the file and starts playing the MP3 data.

Interestingly, if you rename the file to EXE, you'll see that it's the setup file for a cracked version of Ulead GIF Animator 5! But since the EXE file format is not fault tolerant (one bad bit here and there will corrupt it), the setup doesn't run completely till the end. If all the EXE data is there without truncation, then it should be possible to separate the 2 files to get one working EXE and one proper MP3.

So actually, since the file starts with EXE data, both Dubs and TrID identify it as EXE, which is correct! Mystery solved! big_smile

P.S. Where did you get this file? It is so weird... Maybe someone used this technique to hide the EXE and give it to someone? Who knows... roll

Last edited by Andrew (2009-01-05 20:11)

Offline

#9 2009-01-09 20:03

eR@SeR
Senior Member
From: Земун, Србија
Registered: 2008-01-23
Posts: 353

Re: TrID library for detecting file extension

Andrew wrote:

Did you read the instructions on the following page...

Yes, of course... Like I said

...I cannot run tridscan.exe.

Right after I run, it immediately shutdowns. I don't know the reason. In fact doesn't matter. Most common extensions are detectable and that is the point. I will not submit and that's it. Thank you for offering help and quotation of author also wink

Andrew wrote:

That's why Dubs also says exe|com|dll|drv|sys|ocx|cpl|scr|vxd (though TrID is a bit better here!), 'cos internally they're all very similar or even the same and just for application purposes have been given different extensions.

Well, Dubs gives you an alternatives but TrID gives you only one detection which is wrong in case dll ---> exe and in less cases of wmv ---> wma. So better solution will be as Denis did, to give you a choice to pick one of two or more extensions. Better this than wrong extension to be result. I hope you agree with me? big_smile

SafetyCar wrote:

I used used this program to see that that code. Free Hex Editor XVI32...
.
(By the way, the txt files do not have that special start code)

Now I know how to get those 'magic bytes' and submit more extensions. Thank you wink
Yes I noticed that when I experimented with txt and with diz, nfo files too, probably because of that Denis didn't add that detection(s)...

Andrew wrote:

So clearly, Dubs and TrID are meant to identify Binary files and not Text files. Hope that clears up any doubts you have.

Yes it's clear now. How can I even know that txt files don't have magic bytes when I didn't try by myself i.e. didn't know how to "see" those paired digits? Right? tongue

Andrew wrote:

...so here's the next version of the script, which will not delete the existing extension if it cannot identify it properly.

Nice one and I hope that it will be next releases (yet even better) of scripts too big_smile

Andrew wrote:

Interestingly, if you rename the file to EXE, you'll see that it's the setup file for a cracked version of Ulead GIF Animator 5!

Yes, I renamed immediately to exe when I saw that something is wrong with it and noted that. I run it and result was error. I don't know how he appended that exe part or what so ever to mp3 or why, but that "joker" succeeded in his intention big_smile

Andrew wrote:

So actually, since the file starts with EXE data, both Dubs and TrID identify it as EXE, which is correct! Mystery solved!

...but it is actually a mp3 file in great amount. Winamp displays 9:34 but plays (wow) from 7:58... Those 1:36 at beginning could be an exe.

Andrew wrote:

P.S. Where did you get this file? It is so weird... Maybe someone used this technique to hide the EXE and give it to someone? Who knows...

I think I downloaded via Limewire and on Gnutella you can find everything even these craps as well neutral Wrong way to hide cracked application obviously lol
At the other hand he is a genius. He appended an exe and mp3. Remarkable no doubt lol


TRUTH, FREEDOM, JUSTICE and FATHERLAND are the highest morale values which human is born, lives and dies for!

Offline

#10 2009-01-10 08:50

Andrew
Senior Member
Registered: 2008-05-22
Posts: 542

Re: TrID library for detecting file extension

eR@SeR wrote:

Like I said... I cannot run tridscan.exe. Right after I run, it immediately shutdowns. I don't know the reason.

Weird. What OS do you have? Vista? I'm sure the author would like to know of any possible incompatibility.

eR@SeR wrote:

Well, Dubs gives you an alternatives but TrID gives you only one detection which is wrong in case dll ---> exe and in less cases of wmv ---> wma. So better solution will be as Denis did, to give you a choice to pick one of two or more extensions. Better this than wrong extension to be result. I hope you agree with me? big_smile

Yes, I do! big_smile So here, for your pleasure (and for others, of course wink), is the next version of the script, which displays multiple extensions and allows the user to decide. Please note that unlike Dubs which always shows exe|com|dll|drv|sys|ocx|cpl|scr|vxd in the same order, TrID will show only those extensions that it identifies as being possibly correct, with the greatest advantage being that it will moreover display them from left to right in decreasing order of probability! Cool, eh? cool

{ TrID Script v4 (Originally by Denis; modified by Andrew to display multiple possible extensions) }

// TrID DLL exported functions
function TridAnalyze: integer;
  external 'TrID_Analyze@TrIDLib.dll stdcall';
function TridSubmitFileA(szFileName: pchar): integer;
  external 'TrID_SubmitFileA@tridlib.dll stdcall';
function TridLoadDefsPack(szPath: pchar): integer;
  external 'TrID_LoadDefsPack@tridlib.dll stdcall';
function TridGetInfo(lInfoType: integer; lInfoIdx: integer; sBuf: pchar): integer;
  external 'TrID_GetInfo@tridlib.dll stdcall';

// Constants for TrID_GetInfo
const
  TRID_GET_RES_NUM     = 1; // Get the number of results available
  TRID_GET_RES_FILEEXT = 3; // Filetype extension

function GetExtensions(): String;
var
  Str: String;
  I, NumRes: Integer;

begin
  Result := '';
  SetLength(Str, 100);
  NumRes := TridGetInfo(TRID_GET_RES_NUM, 0, PChar(Str));
  if (NumRes > 0) then
  begin
    for I := 1 to NumRes do
    begin
      TridGetInfo(TRID_GET_RES_FILEEXT, I, PChar(Str));
      if (Pos(LowerCase(String(PChar(Str))), Result) = 0) then
      begin
        if (I > 1) then Result := Result + '/';
        Result := Result + LowerCase(String(PChar(Str)));
      end;
    end;
  end;
end;

var
  Initialized, FileErr: Boolean;

begin
  if (not Initialized) then
  begin
    if (not WideFileExists('TrIDLib.dll')) Or (WideFileSize('TrIDLib.dll')=0)
      Or (not WideFileExists('TrIDDefs.trd')) Or (WideFileSize('TrIDDefs.trd')=0) then
    begin
      ShowMessage('Error! Either TrIDLib.dll or TrIDDefs.trd do not exist in the program directory!');
      FileErr := True;
    end;
    TridLoadDefsPack('');
    Initialized := True;
  end;
  if (FileErr) then
    Exit;
  TridSubmitFileA(PChar(FilePath));
  if (TridAnalyze<>0) And (GetExtensions()<>'') then
    FileName := WideExtractBaseName(FileName) + '.' + GetExtensions();
end.

Here's the result:

inv447.png

j0va81.png

As you can see, TrID's signatures need to be updated a bit (like for SYS files, which I'll do and send in when time permits), but it is without doubt better than Dubs. BTW, I feel "dll/drv" is better than "dll|drv" (confusing depending on system font), so I used "/" instead of "|". I feel Dubs should also be modified accordingly by Denis.

eR@SeR wrote:

...but it is actually a mp3 file in great amount. Winamp displays 9:34 but plays (wow) from 7:58... Those 1:36 at beginning could be an exe.

Well I would argue that purely in terms of the binary content, the EXE data in that file outweighs the MP3 data. The duration of the MP3 is irrelevant here as it could be of extremely low bitrate, or a very simple tone, which would imply less data. Anyway, the important thing to note here is that the file begins with EXE data, and both Dubs and TrID match from the beginning of the file and hence IMO correctly identify it as an EXE. No tool in the world will be prepared to identify such a weird file as EXE+MP3, right? wink lol

P.S. Let's stop discussing about that file now BTW. smile

Offline

Board footer

Powered by FluxBB