but it still takes 15 minutes to clear a table of 500k entries?
If you are clearing only a portion of files (any number between 1 and N-1 out of N files), then a selective deletion would be involved and that is a slow process, for reasons explained earlier.
However, if you are clearing all files, then it should be significantly faster, more-less instantaneous. If it is still slow in your case, then please provide more specifics so that it can be investigated, e.g. version number, exact number of files, method of clearing, etc.
]]>Somehow, I missed your reply; maybe it got filtered by Hotmail? Anyway, so you've improved it, but it still takes 15 minutes to clear a table of 500k entries? And if it's not O(n) then more than double that for 1M entries - this is still so long, considering that it takes just a few minutes to rename everything. I had to do this again recently, but I used a simple FOR loop in the command prompt; it was a bit slower than using reNamer, but I didn't have to wait an hour for the window to close afterwards.
Alright, I'll probably be using reNamer for something more than just changing file extensions soon, where the number of files is in the thousands rather than hundreds of thousands; I'll see about purchasing it then.
Thanks.
]]>All additional optimizations have been implemented in v6.8.0.2 Beta, including a more efficient selective clearing when all files match the clearing criteria.
No further optimizations can be applied at this stage, because it is a very complex 3rd party visual component that is the bottleneck now.
]]>I thought all ReNamer needed was fast sequential operations
Not exactly. The objects which represent files are abstracted away from the visual files table, which contains only virtual nodes. Linked file objects and their properties are retrieved on demand as needed. The table can be shuffled around without ever affecting the underlying file objects or their order. There are multiple different structures in play to make most operations fast, with a smallest possible memory footprint.
However, as you have noticed, disorderly and partial deletion operations suffer a severe inefficiency. The use of AVL tree should overcome this inefficiency.
If you remember, let me know once you have a beta. Is there a way to get this forum to email when someone replies?
Updates will be posted in this thread, but you don't seem to be subscribed to this topic.
Click on the "Subscribe to this topic" link at the bottom of the page.
]]>Wow, ReNamer needs a Tree? Binary trees are good for fast "random" insert/delete/searches, but I would not have guessed that you needed that for manipulating a list of filenames; I thought all ReNamer needed was fast sequential operations (i.e. a linked list or an array). AVL Trees look amazing! Literally everything is O(log n) and balancing stuff is all O(1) on average; they've even got documented algorithms for Union, Intersection and Difference.
If you remember, let me know once you have a beta.
Is there a way to get this forum to email when someone replies? I have to come back here regularly right now just to check.
I rarely use ReNamer so the "Free" option was enough for me (but I do pitch it to anyone who has rename issues, I tell them it's like a prototyping tool only you don't have to code it once you're done) but I'll definitively purchase a license if it does as well as you say.
Thank you.
Best Regards,
We have investigated several alternative data structures and benchmarked Search, Insert and Delete operations in various orders. Basic structures such as array and linked list simply grind to a halt for large datasets, at least for some of these operations.
Then we came across AVL tree data structure, which is a self-balancing binary search tree. It has shown incomparable superiority in all benchmarks for large datasets at a cost of slightly worst performance for small datasets.
The plan is to use AVL tree, hopefully in near future.
]]>Yeah, you could rename the "Clear files table" option, or add a little "?" circle next to it, saying files with rename errors stay there, and then add a 3rd option "Clear all files on rename (quick)" or something similar. But I still think there has to be a better way. I mean, I can load Excel with a million lines and I am sure I can do lots of stuff with the data much quicker than ReNamer clears the file list (I haven't tried tho). You could maybe keep a count of the number of errors, and if less than 1/2 the number of list entries, build a new list with the files with errors and flush the old list. ReNamer is really good at loading the list, it takes less than 2 minutes to load all 1.3M entries; I am assuming that copying a few entries to a new list would be just as fast. It would go from an hour to a few minutes, a sizable improvement.
Best Regards,
]]>Currently, only the explicit clearing of all files via the context menu of the files table (Clear -> Clear All) is able to use the optimized clearing method.
Options which automatically clear files on rename actually perform a selective clearing. Option "clear files table on rename" skips files with an error status, while "clear renamed files on rename" only clears files with a renamed status.
A potential workaround could be to check if the selective clearing would affect all files, and then use the optimized clearing method in such case, but this would add an extra overhead when it is not the case.
Another potential workaround is to add an option which clears all files on rename regardless of their status.
]]>Bad news; still takes an hour. So I thought maybe I was using some wrong option. Found "Remove Renamed Files" and "Clear List" options. I was using "Remove Renamed Files" so I figured enabling "Clear List" would make it go really fast? No change. Then I thought maybe I should not be enabling both options, so I unchecked "Remove Renamed Files" and kept "Clear List" but still no change, took an hour again.
Could be some case of cache problem where the way you clear the lists causes a cache miss constantly? And the Xeon in your workstation doesn't have that problem because it has bigger cache lines or something?
Just for your information, I don't see any problems with my memory subsystem:
Dual DDR4-3200 14-14-14-34 CR2
Read Speed 44768 MB/s
Write Speed 47505 MB/s
Latency 44.6 ns
Best Regards,
]]>The fix has been integrated into ReNamer 6.7.0.9 Beta. Please try the latest development version:
http://www.den4b.com/download/renamer/beta
Clearing all files is significantly faster now.
Note: Partial clearing of files will still be slow, because it is not possible to apply the same optimization in this case.
]]>Glad you found the problem. I never wanted to say "ReNamer is crap" or anything like that, and I am glad you took my criticism openly - ReNamer as far as I am concerned is the best renaming utility out there, bar none; the "program your rename operation" model is exceptional, and you go beyond string manipulations with support for Tags etc (and if I recall, you went nuts and included a scripting language). But something was really wrong with clearing-up of the renamed items, so I said "fails miserably with large number of files." A bit strong, my apologies.
Best Regards,
]]>So, we have conducted an in-depth investigation and discovered an inefficiency in the clearing operation, which causes large memory relocations due to inadvertent poor sequencing. In simple terms, an efficient clearing of one list causes very inefficient clearing of referenced items in another list.
The cost of this inefficiency increases exponentially and can be amplified by differences in memory and CPU performance as well as memory availability.
A fix is on the way and should become available shortly.
Thank you for your persistence!
]]>First, I am not sure what your development station looks like, but your numbers are not exactly like mine - It definitively takes LESS time to "Create and Load" the files than it takes to "Clear" them. Even renaming, because of my write-back cache I guess, takes less time than clearing. It really took a whole 5 minutes to clear 500K files - on a Skylake @ 4.5GHz with fast memory; I have no idea how it can take you only 72 seconds.
Second, I had occasion to test this again, this time with 2017's junk mail archive. This amounted to ~1.3M files. Again, it took about 2 minutes to load all the files into ReNamer, and 2 minutes to rename everything. However, it took an HOUR to clear the list. That's right, a whole HOUR! Clearly this is an area that can be improved upon. It shouldn't be that renaming files is O(N) but clearing the list-view is O(N^2) or worse. It's almost as-if you were re-sorting the list every time an element was removed (it's not, of course, since removing an element from a sorted linked-list keeps the list in order, but /something/ is eating-up all those cycles.)
Some suggestions: Use memory pools. And since creating the list is super-fast, why not, after a rename, walk the list and create a new list with the items that are supposed to stay behind, and then free-up the entire pool where the first list resides? I don't know how you're doing it now, but it fails miserably with a large number of files.
Best Regards.
]]>Let's have a look at the benchmark timings from a development station:
Num files | 100K | 200K | 500K |
Create files | 20 | 56 | 163 | seconds
Load files | 14 | 29 | 72 | seconds
Memory use | 60 | 100 | 224 | MB
Preview | 1 | 1 | 1 | seconds
Rename | 59 | 132 | 357 | seconds
Clear files | 3 | 12 | 72 | seconds
* A separate tool was used to time the creation of files.
* Memory use is the working memory set as displayed in the Process Explorer.
* Preview was performed with a simple Insert rule and without validation.
The time taken to clear all files increases exponentially, which is undesirable, but unavoidable unfortunately. Memory management takes it toll when reshuffling millions of variable allocations, but it is necessary to maintain a clean state (stable and memory leak free). Note that the timing values are still very reasonable.
I would advise you to inspect and tweak your settings in ReNamer. Some of them may have a significant impact on the performance when processing large amount of files.
In regards to the 2 frozen windows appearing in the taskbar. That is normal. ReNamer has a hidden application management window in addition to the main window. Windows may decide to show both in the taskbar when application is too busy and appears frozen.
Killing ReNamer's process instead of waiting for it to clear all files is an acceptable workaround. The operating system will cleanup the entire memory footprint instantly, instead of cleaning millions of memory allocations individually.
]]>