Pipelined conjugate gradient method with a single non-blocking reduction per two iterations. Pipelined Krylov Methods
This method has only a single non-blocking reduction per two iterations, compared to 2 blocking for standard CG. The non-blocking reduction is overlapped by two matrix-vector products and two preconditioner applications.
MPI configuration may be necessary for reductions to make asynchronous progress, which is important for performance of pipelined methods. See What steps are necessary to make the pipelined solvers execute efficiently?
Manasi Tiwari, Computational and Data Sciences, Indian Institute of Science, Bangalore
Manasi Tiwari and Sathish Vadhiyar, “Pipelined Conjugate Gradient Methods for Distributed Memory Systems”, Submitted to International Conference on High Performance Computing, Data and Analytics 2020.
The implementation code contains a good amount of hand tuned fusion of multiple inner products and similar computations on multiple vectors