MtxVec VCL
|
The object implements everything neccessary to simplify threading of loops.
MtxForLoop is specifically designed to make threading of computational routines which use MtxVec simpler and to make the code run faster.
Multi-threading is only meaningful, if it takes more time for processing than to start and stop the threads (which should do the processing). The start/stop time defines how "short" processing tasks can still be speeded up further by using multiple threads.
A TMtxForLoop variable would typically be a global variable created only once, but multiple instances are also possible. By default it will initialize Cpu Core count number of threads in its thread pool, but that can be changed via its ThreadCount property. The threads will be in standby (waiting) when not used and processing can be initiated very quickly. The cost to start and detect the end of the processing is about 50us. Jobs taking less than about 0.1ms thus can't be speed up with threading.
By default the threads allocated by TMtxForLoop are super-conductive with relation to MtxVec object cache. If multiple objects of TMtxForLoop type are instantiated or thread count to be launced is more than CPU Core Count, then the size of the object cache Controller.ThreadDimension needs to adjusted as well. This value needs to be more than the combined expected concurrently executing thread count. The default value for Controller.ThreadDimension is CPU Core Count plus one.
It is still the users responsibility to guard all variables from being concurrently modified by multiple threads with critical sections. TMtxForLoop has two methods for this purpose: Enter and Leave, which are unused internally.
The following loop can achive about 50% faster execution on (quad) Core i7:
That can be used as a guide about how "fat" does the code we want to thread has to be to make it worthwile for multi-threading.
To make code future proof, recommended minimum job running time per thread is 5ms. This will ensure that when the number of CPU cores increase, the minimum running time per one core will remain well above 0.1ms allowing linear scaling of performance.
Copyright (c) 1999-2024 by Dew Research. All rights reserved.
|
What do you think about this topic? Send feedback!
|