The atomic group will only help in case of total matches. If there is no success the RegExp-Enging should advance the start by one character. After this the exception won't match anymore ...
So on using regexp:
To get the exceptions skipped they have to be matched always - even if there is no upper case following
at a position like 'mO it is hard to see that it is positioned inside 'ExceptionNumOne' by lookaround. So this should not be used
therefor a pattern like '(ex1|ex2|...|exn|[a-z])' should be used
to keep it it has to be part of the replacement (like $1)
a space should only be inserted if this is followed by an uppercase letter. In PascalScript there are no conditionals available. So it could only be inserted every time.
And to get rid of the wrong spaces a second rule is needed.
We need information for the second rule to recognize these spaces. Again no conditional so the correctly inserted spaces will also be handled here.
I came up with this:
1st Expression: (ExceptionNumOne|ExceptionTwo|[^A-Z]*[a-z])(?=([A-Z]\B)?)
1st Replace: $1< $2>
2nd Expression: <(( ).| )>
2nd Replace: $2
'[a-z]' instead of '[^A-Z]*[a-z]' would also work but this would lead to many '< >' between to lower case characters.
All together into a PascalScript:
const
EXCEPTIONS_FILENAME = 'path\exceptions.txt';
REPLCE1 = '$1< $2>';
REGEXP2 = '<(( ).| )>';
REPLCE2 = '$2';
var
Pattern : WideString;
Initialized : Boolean;
procedure Init();
begin
if not Initialized then
begin
Pattern := '(' +
WideJoinStrings(FileReadTextLines(EXCEPTIONS_FILENAME), '|') +
'|[^A-Z]*[a-z])(?=([A-Z]\B)?)';
Initialized := True;
end;
end;
begin
Init();
FileName := ReplaceRegEx(FileName, Pattern, REPLCE1, True, True)
FileName := ReplaceRegEx(FileName, REGEXP2, REPLCE2, True, True)
end.
Another special case comes to my mind: what if the exception word is contained but with additional lower case letters?
This is a good insight that I hadn't considered entirely. In your example I'd expect ExceptionNumOnetimeTest.txt to evaluate to either "ExceptionNumOne timeTest.txt" (Always isolate the exception) or "ExceptionNumOnetime Test.txt" (Perform camel/pascal case separation exactly after the exception ends).
I think I prefer option one, and always isolate the exception string.
Also I tried your regex above:
(?>ExceptionOnePlusMore|ExceptionOne|[a-z])(?=[A-Z]\B)
With a replace string of
$1
And it's just deleting the exceptions. Not sure I have the correct RegEx here.
If it's not too much to ask, can you possibly outline the RegEx as well as PascalScript you would use to read from a file that contains a list of exceptions? I don't care if I have to separate everything by a pipe in the file or restructure the content of the file to a pipe delimited string.
The biggest mystery to me right now is just the exact RegEx I need. I'm not very good at it.
Thanks for any help at all!
]]>Thanks much for the help. Unfortunately the expression you supplied isn't working:
Ah you are right - if the exception matches but isn't followed by an upper case letter the regexp engine throws away the match and tries to find another match ...
Using the first part as atomic match would help here. As there won't be a second try you have to make sure that your exceptions are placed in the right order if one exception is the beginning of another longer one:
(?>ExceptionOnePlusMore|ExceptionOne|[a-z])(?=[A-Z]\B)
Another special case comes to my mind: what if the exception word is contained but with additional lower case letters? Like in this text:
ExceptionNumOnetimeTest.txt
Also I plan on having a large amount of exceptions (100+) so some kind of PascalScript implementation I think would be better so I can store my exceptions in a constant.
You could also use the shown expression inside a PascalScript block by calling ReplaceRegEx. Here you also could consider to write an initialization block (run only on the first call/file) where you read the exceptions from a file by FileReadContent.
Words could be placed in separate lines if you replace the line breaks by | (pipe or vertical bar) after reading.
]]>One advantage of lookahead and lookbehind is that you don't need to reinsert these parts on replacement.
In other words: if you check your surroundings without using '(?<=...)', '(?<!...)', '(?=...)', '(?!...)' you just have to use these subexpressions inside your replace string via $1..$9.
But! The next replacement would start after the catched string. So in your special situation where you to recognize exceptions by their first character it would be wise to stay with a positive lookahead (at the end).
Try this Expression:
(ExceptionNumOne|ExceptionTwo|[a-z])(?=[A-Z]\B)
with this Replace (containing a space at the end):
$1
and make sure the option "Case-sensitive" is set.
Hope that helps!
Thanks much for the help. Unfortunately the expression you supplied isn't working:
I need the exceptions to remain as they are no matter if they are part of a word, or isolated by whitespace. Right now if the exception occurs alone followed by a space, the word is separated (Check image above).
Also I plan on having a large amount of exceptions (100+) so some kind of PascalScript implementation I think would be better so I can store my exceptions in a constant.
Really appreciate the help though!
]]>In other words: if you check your surroundings without using '(?<=...)', '(?<!...)', '(?=...)', '(?!...)' you just have to use these subexpressions inside your replace string via $1..$9.
But! The next replacement would start after the catched string. So in your special situation where you to recognize exceptions by their first character it would be wise to stay with a positive lookahead (at the end).
Try this Expression:
(ExceptionNumOne|ExceptionTwo|[a-z])(?=[A-Z]\B)
with this Replace (containing a space at the end):
$1
and make sure the option "Case-sensitive" is set.
Hope that helps!
]]>This script for the Pascal Script rule should do it.
List all your special cases in the Exceptions constant. You will need to use at least ReNamer 7.4.0.2 Beta.
Hi again, I just got around to testing the script.
While it works if an exception occurs at the beginning of the filename with no spaces, it falls apart if the exception occurs anywhere else in the string. See this image:
Input:
ExceptionTwoTest
ExceptionTwo Test
Test ExceptionTwo
Test ExceptionNumOne
ExceptionNumOneTest
ExceptionNumOne Test
Correct Output:
ExceptionTwo Test
ExceptionTwo Test
Test ExceptionTwo
Test ExceptionNumOne
ExceptionNumOne Test
ExceptionNumOne Test
Actual Output:
ExceptionTwo Test
Exception Two Test
Test Exception Two
Text Exception Num One
ExceptionNumOne Test
Exception Num One Test
I think it has to do with the regex. I'm not sure how to modify it though to fix the issue. I'll keep experimenting.
If you happen to know a fix that would be great!
]]>List all your special cases in the Exceptions constant. You will need to use at least ReNamer 7.4.0.2 Beta.
const
Exceptions = 'HoRNet|more|exceptions|here';
var
Positions: TIntegerArray;
Matches: TWideStringArray;
I, NumMatches, PositionAnchor: Integer;
begin
PositionAnchor := Length(FileName);
NumMatches := FindRegEx(FileName, '(?<=[a-z])(?=[A-Z]\B)', True, Positions, Matches);
for I := NumMatches - 1 downto 0 do
begin
if Positions[I] <= PositionAnchor then
begin
Insert(' ', FileName, Positions[I]);
Matches := MatchesRegEx(Copy(FileName, 1, Positions[I]-1), '(' + Exceptions + ')\Z', True);
if Length(Matches) > 0 then
PositionAnchor := Positions[I] - Length(Matches[0]);
end;
end;
end.
Example input:
HoRNetCompExp_x64
HoRNetDynamicsControl_x64
HoRNetHarmonics_x64
HoRNetHTS9_x64
HoRNetMulticompPlusMK2_x64
HoRNetSongKeyMK3_x64
HoRNetTape_x64
HoRNetValvola_x64
Example output:
HoRNet Comp Exp_x64
HoRNet Dynamics Control_x64
HoRNet Harmonics_x64
HoRNet HTS9_x64
HoRNet Multicomp Plus MK2_x64
HoRNet Song Key MK3_x64
HoRNet Tape_x64
HoRNet Valvola_x64
Regarding the error message:
It should be possible to achieve the same result with multiple rules or pascal script.
If you don't mind, this is the expression that I want to get working the most:
https://stackoverflow.com/questions/752 … 3_75228498
Expression:
(?:HoRNet|more|exceptions|here)(*SKIP)(*F)|(?<=[a-z])(?=[A-Z]\B)
Replace:
(A space)
Do you know how I can get this working through usage of multiple rules or PascalScript?
The expression separates CamelCase similar to the clean-up function, but allows for a "whitelist" that when encountered doesn't get modified.
Does PascalScript utilize a more robust RegEx engine? If so, can I invoke more advanced RegEx expressions through PascalScript?
Any help at all would be greatly appreciated.
]]>TRegExpr compile: lookaround brackets must be at the very beginning/ending (pos 4)
This is a known limitation of the regex engine, as documented here:
https://regex.sorokin.engineer/en/lates … assertions
Brackets for lookahead must be at the very ending of expression, and brackets for lookbehind must be at the very beginning. So assertions between choices |, or inside groups, are not supported.
It should be possible to achieve the same result with multiple rules or pascal script.
]]>My exact goal is here (this is my post): https://stackoverflow.com/questions/752 … 2#75228472
The first response gave me the following solution:
Expression:
(?:(?<=\s)|(?<=\sv))(\d+)\s(?=\d)
Replace:
$1.
Unfortunately I am getting an error in Renamer.
TRegExpr compile: lookaround brackets must be at the very beginning/ending (pos 4)
Can someone explain to me how I can get this expression to work?
Or what regex engine that ReNamer uses so I can ask for more specific help on stackoverflow?
Any help at all would be great.
]]>