#1 2022-07-29 14:35

loungebob
Member
Registered: 2015-09-29
Posts: 45

Distilling 100s of rules into a few concise ones...

so after years of usage, I've accumulated hundreds of rules that have gotten difficult to maintain.

lately I thought about it and figured this should actually be kind of easy and be doable with a few rules actually. at least in my mind.

I rename shows and movies.

they start with the movie name or show name, then year for movies and season/episode numbering for shows. then followed by a bunch of meta data then dash and release group name and then the file format. the subexpressions are separated by dots.

in my head, I can tell ReNamer to look at everything after S0?E?? and before .mkv and then just re-arrange the data to my liking.

like: SHOWNAME.S01E01.NF.1080p.WEB-DL.DDP5.1.HDR.x264-RG becomes: SHOWNAME.S01E01.1080p.NF.WEB-DL.HDR.DDP5.1.x264-RG

I already have manual rules to catch the above for most circumstances and dozens of regex rules to do the same. but is there a way to tell ReNamer to assign a value to each of the meta data (like one for WEB-DL, HDR, HLG, x264 etc) and then re-arrange them in a specified way in just a couple of rules? also, ignore if certain values are not present/found?

rearrange sounded like the function I wanted but I've never really gotten it to work over the years.

so I use rules like these
288) Replace: Replace all ".HDR.1080p.WEBRip.x265-" with ".1080p.NF.WEBRip.10bit.HDR.DDP5.1.x265-" (skip extension)
350) Replace: Replace using wildcards ".HMAX.WEB-DL.*.H.*-" with ".HMAX.WEBRip.$1.x$2-" (skip extension)
364) Replace: Replace using wildcards "S0?E??.*.1080p" with "S0$1E$2$3.1080p" (skip extension)
365) Replace: Replace using wildcards "S0?e??.*.2160p" with "S0$1E$2$3.2160p" (skip extension)

it seems unnecessarily complex but this seems how my brain is wired lol.

I appreciate any input how to clean this all up.

Offline

#2 2022-07-30 14:52

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,367

Re: Distilling 100s of rules into a few concise ones...

It might be very difficult, if even possible at all, to replace your hundreds of specialised rules with just a few generic rules, if there are many variations in tokens, their count and their order. It may turn out that using Pascal Script is the only way to do it.

So maybe we'll start with your first example and then we'll see what comes next.

First, let's break down your example filename into numbered tokens and rearrange them into a new sequence:

SHOWNAME.S01E01.NF.1080p.WEB-DL.DDP5.1.HDR.x264-RG
1        2      3  4     5      6    7 8   9
SHOWNAME.S01E01.1080p.NF.WEB-DL.HDR.DDP5.1.x264-RG
1        2      4     3  5      8   6    7 9

The corresponding Rearrange rule (use without quotes):

> Split by delimiters: "."
> New pattern: "$1.$2.$4.$3.$5.$8.$6.$7.$9"
> Skip extension: yes

This will work for filenames with the same number of tokens and arranged in the same order.

Offline

#3 2022-07-30 15:10

loungebob
Member
Registered: 2015-09-29
Posts: 45

Re: Distilling 100s of rules into a few concise ones...

hi den, I think the above is what I did and got after all those years but then decided to not pursue it any further since this is only useful if those °%*&ç groups would follow a proper pattern.

if I dont misunderstand the above, this only works if e.g. NF would always be before the resolution data and not just anywhere in-between season/episode number and file ending, right?

but that is exactly what would make my loads of rules obsolete if ReNamer would look at every instance between the above mentioned anchors and keep e.g. NF no matter where it is position-wise and then put it where I tell it to order-wise.

so basically, RN would still assign a value to NF but I would not need to know it, I would just tell it to place NF now between 1080p and WEB-DL.

oh I think I'm explaining that very confusingly lol.

there is basically a list of let's say 50 words (DSNP, NF, HMAX, HULU....DDP, DD, FLAC, AAC....WEB-DL, WEBRip, BCORE, BluRay.....1080p, 1080i, 2160p....x264, x265, h.264, hevc.....and so on) that can appear between season/episode numbers and between -GROUP/.file
I was wondering if I can define that list somewhere, ReNamer would look at everything, I assume then assigns a token to each value and then re-orders them in the way I define (by using value names from the list, not tokens, since I would not know the token because of the changing order).
also, the above rearrange rule would mostly not work because of changing number of tokens.

Offline

#4 2022-08-09 01:00

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,367

Re: Distilling 100s of rules into a few concise ones...

loungebob wrote:

there is basically a list of let's say 50 words (DSNP, NF, HMAX, HULU....DDP, DD, FLAC, AAC....WEB-DL, WEBRip, BCORE, BluRay.....1080p, 1080i, 2160p....x264, x265, h.264, hevc.....and so on) that can appear between season/episode numbers and between -GROUP/.file
I was wondering if I can define that list somewhere, ReNamer would look at everything, I assume then assigns a token to each value and then re-orders them in the way I define (by using value names from the list, not tokens, since I would not know the token because of the changing order).
also, the above rearrange rule would mostly not work because of changing number of tokens.

This can be achieved only with a Pascal Script rule, but it will take a bit of coding.

Can you post the list of all tokens in each group? Maybe someone will write that script for you...

Offline

#5 2022-10-17 15:20

loungebob
Member
Registered: 2015-09-29
Posts: 45

Re: Distilling 100s of rules into a few concise ones...

so, here we go.

this would be a sample result name:
SAMPLE.MOVIE.2022.SPANISH.IMAX.REMASTERED.PROPER.HYBRID.2160p.UHD.BluRay.REMUX.DV.10bit.HDR.TrueHD.7.1.Atmos.HEVC-SAMPLEGROUP

this is the basic structure:
(MOVIENAME).(YEAR).(LANGUAGE).(VERSION).(RELEASE).(FORM).(RESOLUTION).(SOURCE).(SOURCE FORMAT).(VARIANT).(HDR).(AUDIO).(VIDEO)-(GROUP)

the variables (MOVIENAME) (YEAR) (GROUP) I'll take care of separately. and by that I mean, rules early in the rules list will format movie name and group name. placing is mostly no issue since this is almost always adhered to by rls groups.

I would need something that works for the rest below. naturally I'll have a few rules running beforehand to fix up the words so they'd correspond to the tokens below.

also, not every variable/token will be present in every filename. so there would need to be a way for the script to ignore variables if not present.

ideally, the list below could be easily changed (order, naming, names, adding/removing stuff...) but I think I can do that myself, I'm not too bad in altering and adapting existing scripts but really bad in creating them.

(LANGUAGE)
GERMAN
FRENCH
ITALIAN
SPANISH
PORTUGUESE
JAPANESE
CHINESE
KOREAN
SWEDISH
FINNISH

(VERSION)
IMAX
DIRECTORS CUT
THEATRICAL CUT
UNRATED CUT
EXTENDED CUT
UNCUT
REMASTERED

(RELEASE)
PROPER
REPACK
RERIP
INTERNAL
DIRFIX

(FORM)
HYBRID

(RESOLUTION)
480p
1080p
1080i
2160p

(SOURCE)
NF
AMZN
DSNP
HULU
ATVP
STAN
PMTP
HMAX
PCOK
SHO
BCORE
iT
iP

(SOURCE FORMAT)
WEB-DL
WEBRip
BluRay
UHD BluRay
HDTV

(VARIANT)
REMUX

(HDR)
DV
HDR
HDR10+
HLG
SDR

(AUDIO)
DTS-HD.MA.1.0
DTS-HD.MA.2.0
DTS-HD.MA.5.1
DTS-HD.MA.6.1
DTS-HD.MA.7.1
DTS-X.7.1
TrueHD.1.0
TrueHD.2.0
TrueHD.5.1
TrueHD.5.1.Atmos
TrueHD.6.1
TrueHD.7.1
TrueHD.7.1.Atmos
FLAC.1.0
FLAC.2.0
FLAC.5.1
DD1.0
DD2.0
DD5.1
DDP2.0
DDP5.1
AAC1.0
AAC2.0
AAC5.1

(VIDEO)
H.264
H.265
x264
x265
MPEG2
AVC
HEVC
VC-1

Last edited by loungebob (2022-10-17 16:15)

Offline

#6 2022-10-20 08:41

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,367

Re: Distilling 100s of rules into a few concise ones...

The script below should do the job according to your specification.

Example input:
SAMPLE.MOVIE.2022.SPANISH.IMAX.REMASTERED.PROPER.HYBRID.2160p.UHD.BluRay.REMUX.DV.10bit.HDR.TrueHD.7.1.Atmos.HEVC-SAMPLEGROUP.avi

Example output:
SAMPLE.MOVIE.2022.SPANISH.IMAX.PROPER.HYBRID.2160p.BluRay.REMUX.DV.TrueHD.7.1.HEVC-SAMPLEGROUP.avi

There are some assumptions:
1. The movie name is always followed by a 4 digit year number.
2. The group name does not contain dots and begins with "-".
3. Keyword definitions (see constants) must not contain RegEx meta characters other than "." and "|".

const
  WordsLanguage      = 'GERMAN|FRENCH|ITALIAN|SPANISH|PORTUGUESE|JAPANESE|CHINESE|KOREAN|SWEDISH|FINNISH';
  WordsVersion       = 'IMAX|DIRECTORS CUT|THEATRICAL CUT|UNRATED CUT|EXTENDED CUT|UNCUT|REMASTERED';
  WordsRelease       = 'PROPER|REPACK|RERIP|INTERNAL|DIRFIX';
  WordsForm          = 'HYBRID';
  WordsResolution    = '480p|1080p|1080i|2160p';
  WordsSource        = 'NF|AMZN|DSNP|HULU|ATVP|STAN|PMTP|HMAX|PCOK|SHO|BCORE|iT|iP';
  WordsSourceFormat  = 'WEB-DL|WEBRip|BluRay|UHD BluRay|HDTV';
  WordsVariant       = 'REMUX';
  WordsHdr           = 'DV|HDR|HDR10+|HLG|SDR';
  WordsAudio         = 'DTS-HD.MA.1.0|DTS-HD.MA.2.0|DTS-HD.MA.5.1|DTS-HD.MA.6.1|DTS-HD.MA.7.1|DTS-X.7.1|TrueHD.1.0|TrueHD.2.0|TrueHD.5.1|TrueHD.5.1.Atmos|TrueHD.6.1|TrueHD.7.1|TrueHD.7.1.Atmos|FLAC.1.0|FLAC.2.0|FLAC.5.1|DD1.0|DD2.0|DD5.1|DDP2.0|DDP5.1|AAC1.0|AAC2.0|AAC5.1';
  WordsVideo         = 'H.264|H.265|x264|x265|MPEG2|AVC|HEVC|VC-1';

function ConvertToPattern(const Words: WideString): WideString;
begin
  Result := '\b(' + WideReplaceStr(Words, '.', '\.') + ')\b';
end;

function ExtractWord(const Subject, Words: WideString): WideString;
var
  Matches: TWideStringArray;
begin
  Matches := MatchesRegEx(Subject, ConvertToPattern(Words), False);
  if Length(Matches) > 0 then
    Result := Matches[0]
  else
    Result := '';
end;

function Restructure(const Subject: WideString): WideString;
begin
  // (LANGUAGE).(VERSION).(RELEASE).(FORM).(RESOLUTION).(SOURCE).(SOURCE FORMAT).(VARIANT).(HDR).(AUDIO).(VIDEO)
  Result :=
    ExtractWord(Subject, WordsLanguage) + '.' +
    ExtractWord(Subject, WordsVersion) + '.' +
    ExtractWord(Subject, WordsRelease) + '.' +
    ExtractWord(Subject, WordsForm) + '.' +
    ExtractWord(Subject, WordsResolution) + '.' +
    ExtractWord(Subject, WordsSource) + '.' +
    ExtractWord(Subject, WordsSourceFormat) + '.' +
    ExtractWord(Subject, WordsVariant) + '.' +
    ExtractWord(Subject, WordsHdr) + '.' +
    ExtractWord(Subject, WordsAudio) + '.' +
    ExtractWord(Subject, WordsVideo);
end;

var
  BaseName: WideString;
  Matches: TWideStringArray;
begin
  BaseName := WideExtractBaseName(FileName);
  Matches := SubMatchesRegEx(BaseName, '\A'+'(.+?\.\d{4}\.)'+'(.+)'+'(-[^.]+)'+'\Z', False);
  if Length(Matches) = 3 then
  begin
    FileName := Matches[0] + Restructure(Matches[1]) + Matches[2] + WideExtractFileExt(FileName);
    FileName := ReplaceRegEx(FileName, '\.\.+', '.', False, False);
  end;
end.

Offline

#7 2022-10-22 05:00

loungebob
Member
Registered: 2015-09-29
Posts: 45

Re: Distilling 100s of rules into a few concise ones...

oooh exiting stuff! thanks so much.

I have it running, and I need to get some sleep now but it works. I stumbled across something, and I assume it's a logical error on my part. do I understand this correctly: the script only uses once occurrence per word group? the example output above is not entirely correct because the "remastered" got dropped from the output filename, assuming because "imax" appears first in the file list and was the first match. I have the same issue now with some of the names where the script drops "uhd" beause it already matched "bluray" which is before "uhd bluray" in the list.

can the script match multiple tokens per word group, like imax and remastered and the order in the filename would be the order they get matched within the word group?

Offline

#8 2022-11-09 14:17

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,367

Re: Distilling 100s of rules into a few concise ones...

loungebob wrote:

can the script match multiple tokens per word group, like imax and remastered and the order in the filename would be the order they get matched within the word group?

The updated script below will use all matching words within the group.

However, you will need to rearrange the order of words in some groups to make sure that the most exhaustive matches appear first. For example, having words ordered like "...|TrueHD.7.1|TrueHD.7.1.Atmos|..." will always match "TrueHD.7.1" first and won't even attempt to match "TrueHD.7.1.Atmos", so you need to reorder them like "...|TrueHD.7.1.Atmos|TrueHD.7.1|...".

const
  WordsLanguage      = 'GERMAN|FRENCH|ITALIAN|SPANISH|PORTUGUESE|JAPANESE|CHINESE|KOREAN|SWEDISH|FINNISH';
  WordsVersion       = 'IMAX|DIRECTORS CUT|THEATRICAL CUT|UNRATED CUT|EXTENDED CUT|UNCUT|REMASTERED';
  WordsRelease       = 'PROPER|REPACK|RERIP|INTERNAL|DIRFIX';
  WordsForm          = 'HYBRID';
  WordsResolution    = '480p|1080p|1080i|2160p';
  WordsSource        = 'NF|AMZN|DSNP|HULU|ATVP|STAN|PMTP|HMAX|PCOK|SHO|BCORE|iT|iP';
  WordsSourceFormat  = 'WEB-DL|WEBRip|BluRay|UHD BluRay|HDTV';
  WordsVariant       = 'REMUX';
  WordsHdr           = 'DV|HDR|HDR10+|HLG|SDR';
  WordsAudio         = 'DTS-HD.MA.1.0|DTS-HD.MA.2.0|DTS-HD.MA.5.1|DTS-HD.MA.6.1|DTS-HD.MA.7.1|DTS-X.7.1|TrueHD.1.0|TrueHD.2.0|TrueHD.5.1|TrueHD.5.1.Atmos|TrueHD.6.1|TrueHD.7.1|TrueHD.7.1.Atmos|FLAC.1.0|FLAC.2.0|FLAC.5.1|DD1.0|DD2.0|DD5.1|DDP2.0|DDP5.1|AAC1.0|AAC2.0|AAC5.1';
  WordsVideo         = 'H.264|H.265|x264|x265|MPEG2|AVC|HEVC|VC-1';

function ConvertToPattern(const Words: WideString): WideString;
begin
  Result := WideReplaceStr(Words, '.', '\.');
  Result := WideReplaceStr(Words, '+', '\+');
  Result := '\b(' + Result + ')\b';
end;

function ExtractWords(const Subject, Words: WideString): WideString;
var
  Matches: TWideStringArray;
begin
  Result := '';
  Matches := MatchesRegEx(Subject, ConvertToPattern(Words), False);
  if Length(Matches) > 0 then
    Result := WideJoinStrings(Matches, '.');
end;

function Restructure(const Subject: WideString): WideString;
begin
  // (LANGUAGE).(VERSION).(RELEASE).(FORM).(RESOLUTION).(SOURCE).(SOURCE FORMAT).(VARIANT).(HDR).(AUDIO).(VIDEO)
  Result :=
    ExtractWords(Subject, WordsLanguage) + '.' +
    ExtractWords(Subject, WordsVersion) + '.' +
    ExtractWords(Subject, WordsRelease) + '.' +
    ExtractWords(Subject, WordsForm) + '.' +
    ExtractWords(Subject, WordsResolution) + '.' +
    ExtractWords(Subject, WordsSource) + '.' +
    ExtractWords(Subject, WordsSourceFormat) + '.' +
    ExtractWords(Subject, WordsVariant) + '.' +
    ExtractWords(Subject, WordsHdr) + '.' +
    ExtractWords(Subject, WordsAudio) + '.' +
    ExtractWords(Subject, WordsVideo);
end;

var
  BaseName: WideString;
  Matches: TWideStringArray;
begin
  BaseName := WideExtractBaseName(FileName);
  Matches := SubMatchesRegEx(BaseName, '\A'+'(.+?\.\d{4}\.)'+'(.+)'+'(-[^.]+)'+'\Z', False);
  if Length(Matches) = 3 then
  begin
    FileName := Matches[0] + Restructure(Matches[1]) + Matches[2] + WideExtractFileExt(FileName);
    FileName := ReplaceRegEx(FileName, '\.\.+', '.', False, False);
  end;
end.

Offline

#9 2022-11-20 13:59

loungebob
Member
Registered: 2015-09-29
Posts: 45

Re: Distilling 100s of rules into a few concise ones...

thank you so much! I will test drive this as soon as I have a few spare hours and be back with results :-)

Offline

Board footer

Powered by FluxBB