TMtxForLoop Class

The object implements everything neccessary to simplify threading of loops.

Class Hierarchy

Pascal

TMtxForLoop = class(TPersistent);

File

MtxForLoop

Description

MtxForLoop is specifically designed to make threading of computational routines which use MtxVec simpler and to make the code run faster.

Multi-threading is only meaningful, if it takes more time for processing than to start and stop the threads (which should do the processing). The start/stop time defines how "short" processing tasks can still be speeded up further by using multiple threads.

A TMtxForLoop variable would typically be a global variable created only once, but multiple instances are also possible. By default it will initialize Cpu Core count number of threads in its thread pool, but that can be changed via its ThreadCount property. The threads will be in standby (waiting) when not used and processing can be initiated very quickly. The cost to start and detect the end of the processing is about 50us. Jobs taking less than about 0.1ms thus can't be speed up with threading.

By default the threads allocated by TMtxForLoop are super-conductive with relation to MtxVec object cache. If multiple objects of TMtxForLoop type are instantiated or thread count to be launced is more than CPU Core Count, then the size of the object cache Controller.ThreadDimension needs to adjusted as well. This value needs to be more than the combined expected concurrently executing thread count. The default value for Controller.ThreadDimension is CPU Core Count plus one.

It is still the users responsibility to guard all variables from being concurrently modified by multiple threads with critical sections. TMtxForLoop has two methods for this purpose: Enter and Leave, which are unused internally.

The following loop can achive about 50% faster execution on (quad) Core i7:

for i := 0 to 1000-1 do a[i] := a[i] + 1;

That can be used as a guide about how "fat" does the code we want to thread has to be to make it worthwile for multi-threading.

To make code future proof, recommended minimum job running time per thread is 5ms. This will ensure that when the number of CPU cores increase, the minimum running time per one core will remain well above 0.1ms allowing linear scaling of performance.

Example

//not threaded code (yet to be threaded): var am: Matrix; begin am.Size(1001,100); for LoopIndex := 0 to a.Rows - 1 do begin for j := 0 to a.Cols - 1 do begin a[LoopIndex, j] := a[LoopIndex, j] + 1; end; end; end; //threaded code using TMtxForLoop: procedure MyLoop(LoopIndex: integer; const Context: TObjectArray; ThreadIndex: integer); var j: integer; a: TMtx; begin a := TMtx(Context[0]); for j := 0 to a.Cols - 1 do begin a[LoopIndex, j] := a[LoopIndex, j] + 1; end; end; procedure TMainForm.OnButtonClick(Sender: TObject); var am: Matrix; forLoop: TMtxForLoop begin forLoop := TMtxForLoop.Create; //local var only in this example, allocate globaly for performance am.Size(1001,100); am.SetVal(1); //Start the threads: DoForLoop(0,1000, MyLoop, forLoop,[TMtx(am)]); //blocking call for i := 0 to 1000 do //Code execution will not continue until all threads have finished. ViewValues(am); forLoop.Free; End; //alternative method (unused here), called only once per each started thread: procedure MyLoopRange(IdxStart, IdxEnd: integer; const Context: TObjectArray; ThreadIndex: integer); var j, k: integer; a: TMtx; begin a := TMtx(Context[0]); for k := IdxStart to IdxEnd do //additional vectorization possible inside these two loops for j := 0 to a.Cols - 1 do begin a[k, j] := a[k, j] + 1; end; end;

public, TMtxForLoop Members, MtxForLoop, Classes, Example

What do you think about this topic? Send feedback!