## General¶

### How can I subscribe to the PETSc mailing lists?¶

See mailing list documentation

### What kind of parallel computers or clusters are needed to use PETSc? Or why do I get little speedup?¶

Important

PETSc can be used with any kind of parallel system that supports MPI BUT for any decent performance one needs:

• Fast, low-latency interconnect; any ethernet (even 10 GigE) simply cannot provide the needed performance.

• High per-core memory performance. Each core needs to have its own memory bandwidth of at least 2 or more gigabytes/second. Most modern computers are not bottlenecked by how fast they can perform calculations; rather, they are usually restricted by how quickly they can get their data.

To obtain good performance it is important that you know your machine! I.e. how many compute nodes (generally, how many motherboards), how many memory sockets per node and how many cores per memory socket and how much memory bandwidth for each.

If you do not know this and can run MPI programs with mpiexec (that is, you don’t have batch system), run the following from $PETSC_DIR: > make streams [NPMAX=maximum_number_of_mpi_processes_you_plan_to_use]  This will provide a summary of the bandwidth with different number of MPI processes and potential speedups. If you have a batch system: > cd$PETSC_DIR/src/benchmarks/streams
> make MPIVersion
submit MPIVersion to the batch system a number of times with 1, 2, 3, etc MPI processes
collecting all of the output from the runs into the single file scaling.log. Copy
> ./process.py createfile ; process.py


Even if you have enough memory bandwidth if the OS switches processes between cores performance can degrade. Smart process to core/socket binding (this just means locking a process to a particular core or memory socket) may help you. For example, consider using fewer processes than cores and binding processes to separate sockets so that each process uses a different memory bus:

• MPICH2 binding with the Hydra process manager

> mpiexec.hydra -n 4 --binding cpu:sockets

• Open MPI binding

> mpiexec -n 4 --bysocket --bind-to-socket --report-bindings

• taskset, part of the util-linux package

Check man taskset for details. Make sure to set affinity for your program, not for the mpiexec program.

• numactl

In addition to task affinity, this tool also allows changing the default memory affinity policy. On Linux, the default policy is to attempt to find memory on the same memory bus that serves the core that a thread is running on when the memory is faulted (not when malloc() is called). If local memory is not available, it is found elsewhere, possibly leading to serious memory imbalances.

The option --localalloc allocates memory on the local NUMA node, similar to the numa_alloc_local() function in the libnuma library. The option --cpunodebind=nodes binds the process to a given NUMA node (note that this can be larger or smaller than a CPU (socket); a NUMA node usually has multiple cores).

The option --physcpubind=cpus binds the process to a given processor core (numbered according to /proc/cpuinfo, therefore including logical cores if Hyper-threading is enabled).

With Open MPI, you can use knowledge of the NUMA hierarchy and core numbering on your machine to calculate the correct NUMA node or processor number given the environment variable OMPI_COMM_WORLD_LOCAL_RANK. In most cases, it is easier to make mpiexec or a resource manager set affinities.

The software Open-MX provides faster speed for ethernet systems, we have not tried it but it claims it can dramatically reduce latency and increase bandwidth on Linux system. You must first install this software and then install MPICH or Open MPI to use it.

### What kind of license is PETSc released under?¶

See licensing documentation

### Why is PETSc written in C, instead of Fortran or C++?¶

When this decision was made, in the early 1990s, C enabled us to build data structures for storing sparse matrices, solver information, etc. in ways that Fortran simply did not allow. ANSI C was a complete standard that all modern C compilers supported. The language was identical on all machines. C++ was still evolving and compilers on different machines were not identical. Using C function pointers to provide data encapsulation and polymorphism allowed us to get many of the advantages of C++ without using such a large and more complicated language. It would have been natural and reasonable to have coded PETSc in C++; we opted to use C instead.

No

### How do such a small group of people manage to write and maintain such a large and marvelous package as PETSc?¶

1. We work very efficiently.

• We use Emacs for all editing (and strongly discourage our developers from using other editors). Having a uniform editing environment speeds development up immensely.

• Our manual pages are generated automatically from formatted comments in the code, thus alleviating the need for creating and maintaining manual pages.

• We employ automatic nightly tests of the entire PETSc library on several different machine architectures. This process significantly protects (no bug-catching process is perfect) against inadvertently introducing bugs with new additions. Every new feature must pass our suite of hundreds of tests as well as formal code review before it may be included.

2. We are very careful in our design (and are constantly revising our design)

• PETSc as a package should be easy to use, write, and maintain. Our mantra is to write code like everyone is using it.

3. We are willing to do the grunt work

• PETSc is regularly checked to make sure that all code conforms to our interface design. We will never keep in a bad design decision simply because changing it will require a lot of editing; we do a lot of editing.

4. We constantly seek out and experiment with new design ideas

• We retain the useful ones and discard the rest. All of these decisions are based not just on performance, but also on practicality.

5. Function and variable names must adhere to strict guidelines

• Even the rules about capitalization are designed to make it easy to figure out the name of a particular object or routine. Our memories are terrible, so careful consistent naming puts less stress on our limited human RAM.

6. The PETSc directory tree is carefully designed to make it easy to move throughout the entire package

7. We have a rich, robust, and fast bug reporting system

• petsc-maint@mcs.anl.gov is always checked, and we pride ourselves on responding quickly and accurately. Email is very lightweight, and so bug reports system retains an archive of all reported problems and fixes, so it is easy to re-find fixes to previously discovered problems.

8. We contain the complexity of PETSc by using powerful object-oriented programming techniques

• Data encapsulation serves to abstract complex data formats or movement to human-readable format. This is why your program cannot, for example, look directly at what is inside the object Mat.

• Polymorphism makes changing program behavior as easy as possible, and further abstracts the intent of your program from what is written in code. You call MatMult() regardless of whether your matrix is dense, sparse, parallel or sequential; you don’t call a different routine for each format.

9. We try to provide the functionality requested by our users

### For complex numbers will I get better performance with C++?¶

To use PETSc with complex numbers you may use the following configure options; --with-scalar-type=complex and either --with-clanguage=c++ or (the default) --with-clanguage=c. In our experience they will deliver very similar performance (speed), but if one is concerned they should just try both and see if one is faster.

### How come when I run the same program on the same number of processes I get a “different” answer?¶

Inner products and norms in PETSc are computed using the MPI_Allreduce() command. For different runs the order at which values arrive at a given process (via MPI) can be in a different order, thus the order in which some floating point arithmetic operations are performed will be different. Since floating point arithmetic arithmetic is not associative, the computed quantity may be slightly different.

Over a run the many slight differences in the inner products and norms will effect all the computed results. It is important to realize that none of the computed answers are any less right or wrong (in fact the sequential computation is no more right then the parallel ones). All answers are equal, but some are more equal than others.

The discussion above assumes that the exact same algorithm is being used on the different number of processes. When the algorithm is different for the different number of processes (almost all preconditioner algorithms except Jacobi are different for different number of processes) then one expects to see (and does) a greater difference in results for different numbers of processes. In some cases (for example block Jacobi preconditioner) it may be that the algorithm works for some number of processes and does not work for others.

### How come when I run the same linear solver on a different number of processes it takes a different number of iterations?¶

The convergence of many of the preconditioners in PETSc including the default parallel preconditioner block Jacobi depends on the number of processes. The more processes the (slightly) slower convergence it has. This is the nature of iterative solvers, the more parallelism means the more “older” information is used in the solution process hence slower convergence.

### Can PETSc use GPUs to speed up computations?¶

See GPU development roadmap for the latest information regarding the state of PETSc GPU integration.

See GPU install documentation for up-to-date information on installing PETSc to use GPU’s.

Quick summary of usage with CUDA:

• The VecType VECSEQCUDA, VECMPICUDA, or VECCUDA may be used with VecSetType() or -vec_type seqcuda, mpicuda, or cuda when VecSetFromOptions() is used.

• The MatType MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, or MATAIJCUSPARSE maybe used with MatSetType() or -mat_type seqaijcusparse, mpiaijcusparse, or aijcusparse when MatSetOptions() is used.

• If you are creating the vectors and matrices with a DM, you can use -dm_vec_type cuda and -dm_mat_type aijcusparse.

Quick summary of usage with OpenCL (provided by the ViennaCL library):

• The VecType VECSEQVIENNACL, VECMPIVIENNACL, or VECVIENNACL may be used with VecSetType() or -vec_type seqviennacl, mpiviennacl, or viennacl when VecSetFromOptions() is used.

• The MatType MATSEQAIJVIENNACL, MATMPIAIJVIENNACL, or MATAIJVIENNACL maybe used with MatSetType() or -mat_type seqaijviennacl, mpiaijviennacl, or aijviennacl when MatSetOptions() is used.

• If you are creating the vectors and matrices with a DM, you can use -dm_vec_type viennacl and -dm_mat_type aijviennacl.

General hints:

• It is useful to develop your code with the default vectors and then run production runs with the command line options to use the GPU since debugging on GPUs is difficult.

• All of the Krylov methods except KSPIBCGS run on the GPU.

• Parts of most preconditioners run directly on the GPU. After setup, PCGAMG runs fully on GPUs, without any memory copies between the CPU and GPU.

Some GPU systems (for example many laptops) only run with single precision; thus, PETSc must be built with the configure option --with-precision=single.

### Can I run PETSc with extended precision?¶

Yes, with gcc 4.6 and later (and gfortran 4.6 and later). configure PETSc using the options --with-precision=__float128 and  –download-f2cblaslapack.

Warning

External packages are not guaranteed to work in this mode!

### Why doesn’t PETSc use Qd to implement support for extended precision?¶

We tried really hard but could not. The problem is that the QD c++ classes, though they try to, implement the built-in data types of double are not native types and cannot “just be used” in a general piece of numerical source code. Ratherm the code has to rewritten to live within the limitations of QD classes. However PETSc can be built to use quad precision, as detailed here.

### How do I cite PETSc?¶

Use these citations.

## Installation¶

### How do I begin using PETSc if the software has already been completely built and installed by someone else?¶

Assuming that the PETSc libraries have been successfully built for a particular architecture and level of optimization, a new user must merely:

1. Set PETSC_DIR to the full path of the PETSc home directory. This will be the location of the configure script, and usually called “petsc” or some vairation of that (for example, /home/username/petsc).



### How to fix the problem: PETSc was configured with one MPICH (or OpenMPI) mpi.h version but now appears to be compiling using a different MPICH (or OpenMPI) mpi.h version¶

This happens for generally one of two reasons:

• You previously ran configure with the option --download-mpich (or --download-openmpi) but later ran configure to use a version of MPI already installed on the machine. Solution:

> rm -rf $PETSC_DIR/$PETSC_ARCH
> ./configure --your-args


### What does it mean when make check errors on PetscOptionsInsertFile()?¶

For example:

Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
See https://petsc.org/release/faq/
[0]PETSC ERROR: #1 PetscOptionsInsertFile() line 563 in /Users/barrysmith/Src/PETSc/src/sys/objects/options.c
[0]PETSC ERROR: #2 PetscOptionsInsert() line 720 in /Users/barrysmith/Src/PETSc/src/sys/objects/options.c
[0]PETSC ERROR: #3 PetscInitialize() line 828 in /Users/barrysmith/Src/PETSc/src/sys/objects/pinit.c

• You may be using the wrong mpiexec for the MPI you have linked PETSc with.

• You have VPN enabled on your machine whose network settings may not play well with MPI.

The machine has a funky network configuration and for some reason MPICH is unable to communicate between processes with the socket connections it has established. This can happen even if you are running MPICH on just one machine. Often you will find that ping hostname fails with this network configuration; that is, processes on the machine cannot even connect to the same machine. You can try completely disconnecting your machine from the network and see if make check then works or speaking with your system administrator. You can also try the configure options --download-mpich or --download-mpich-device=ch3:nemesis.

## Usage¶

### How can I redirect PETSc’s stdout and stderr when programming with a GUI interface in Windows Developer Studio or to C++ streams?¶

To overload just the error messages write your own MyPrintError() function that does whatever you want (including pop up windows etc) and use it like below.

extern "C" {
int PASCAL WinMain(HINSTANCE,HINSTANCE,LPSTR,int);
};

#include <petscsys.h>
#include <mpi.h>

const char help[] = "Set up from main";

int MyPrintError(const char error[], ...)
{
printf("%s", error);
return 0;
}

int main(int ac, char *av[])
{
char           buf[256];
HINSTANCE      inst;
PetscErrorCode ierr;

inst = (HINSTANCE)GetModuleHandle(NULL);
PetscErrorPrintf = MyPrintError;

buf[0] = 0;
for (int i = 1; i < ac; ++i) {
strcat(buf, av[i]);
strcat(buf, " ");
}

ierr = PetscInitialize(&ac, &av, NULL, help);if (ierr) return ierr;

return WinMain(inst, NULL, buf, SW_SHOWNORMAL);
}


Place this file in the project and compile with this preprocessor definitions:

WIN32
_DEBUG
_CONSOLE
_MBCS
USE_PETSC_LOG
USE_PETSC_BOPT_g
USE_PETSC_STACK
_AFXDLL


/nologo
/subsystem:console
/incremental:yes
/debug
/machine:I386
/nodefaultlib:"libcmtd.lib"
/nodefaultlib:"libcd.lib"
/nodefaultlib:"mvcrt.lib"
/pdbtype:sept


Note

The above is compiled and linked as if it was a console program. The linker will search for a main, and then from it the WinMain will start. This works with MFC templates and derived classes too.

When writing a Window’s console application you do not need to do anything, the stdout and stderr is automatically output to the console window.

To change where all PETSc stdout and stderr go, (you can also reassign PetscVFPrintf() to handle stdout and stderr any way you like) write the following function:

PetscErrorCode mypetscvfprintf(FILE *fd, const char format[], va_list Argp)
{
PetscErrorCode ierr;

PetscFunctionBegin;
if (fd != stdout && fd != stderr) { /* handle regular files */
ierr = PetscVFPrintfDefault(fd, format, Argp);CHKERRQ(ierr);
} else {
char buff[1024]; /* Make sure to assign a large enough buffer */
int  length;

ierr = PetscVSNPrintf(buff, 1024, format, &length, Argp);CHKERRQ(ierr);

/* now send buff to whatever stream or whatever you want */
}
PetscFunctionReturn(0);
}


Then assign PetscVFPrintf = mypetscprintf before PetscInitialize() in your main program.

### I want to use Hypre boomerAMG without GMRES but when I run -pc_type hypre -pc_hypre_type boomeramg -ksp_type preonly I don’t get a very accurate answer!¶

You should run with -ksp_type richardson to have PETSc run several V or W cycles. -ksp_type preonly causes boomerAMG to use only one V/W cycle. You can control how many cycles are used in a single application of the boomerAMG preconditioner with -pc_hypre_boomeramg_max_iter <it> (the default is 1). You can also control the tolerance boomerAMG uses to decide if to stop before max_iter with -pc_hypre_boomeramg_tol <tol> (the default is 1.e-7). Run with -ksp_view to see all the hypre options used and -help | grep boomeramg to see all the command line options.

### How do I use PETSc for Domain Decomposition?¶

PETSc includes Additive Schwarz methods in the suite of preconditioners under the umbrella of PCASM. These may be activated with the runtime option -pc_type asm. Various other options may be set, including the degree of overlap -pc_asm_overlap <number> the type of restriction/extension -pc_asm_type [basic,restrict,interpolate,none] sets ASM type and several others. You may see the available ASM options by using -pc_type asm -help. See the procedural interfaces in the manual pages, for example PCASMType() and check the index of the users manual for PCASMCreateSubdomains().

PETSc also contains a domain decomposition inspired wirebasket or face based two level method where the coarse mesh to fine mesh interpolation is defined by solving specific local subdomain problems. It currently only works for 3D scalar problems on structured grids created with PETSc DMDA. See the manual page for PCEXOTIC and src/ksp/ksp/tutorials/ex45.c for an example.

PETSc also contains a balancing Neumann-Neumann type preconditioner, see the manual page for PCBDDC. This requires matrices be constructed with MatCreateIS() via the finite element method. See src/ksp/ksp/tests/ex59.c for an example.

### You have AIJ and BAIJ matrix formats, and SBAIJ for symmetric storage, how come no SAIJ?¶

Just for historical reasons; the SBAIJ format with blocksize one is just as efficient as an SAIJ would be.

### Can I Create BAIJ matrices with different size blocks for different block rows?¶

No. The MATBAIJ format only supports a single fixed block size on the entire matrix. But the MATAIJ format automatically searches for matching rows and thus still takes advantage of the natural blocks in your matrix to obtain good performance.

Note

If you use MATAIJ you cannot use the MatSetValuesBlocked().

### How do I access the values of a remote parallel PETSc Vec?¶

1. On each process create a local Vec large enough to hold all the values it wishes to access.

2. Create a VecScatter that scatters from the parallel Vec into the local Vec.

3. Use VecGetArray() to access the values in the local Vec.

For example, assuming we have distributed a vector vecGlobal of size $$N$$ to $$R$$ ranks and each remote rank holds $$N/R = m$$ values (similarly assume that $$N$$ is cleanly divisible by $$R$$). We want each rank $$r$$ to gather the first $$n$$ (also assume $$n \leq m$$) values from it’s immediately superior neighbor $$r+1$$ (final rank will retrieve from rank 0).

Vec            vecLocal;
IS             isLocal, isGlobal;
VecScatter     ctx;
PetscScalar    *arr;
PetscInt       N, firstGlobalIndex;
MPI_Comm       comm;
PetscMPIInt    r, R;
PetscErrorCode ierr;

/* Create sequential local vector, big enough to hold local portion */
ierr = VecCreateSeq(PETSC_COMM_SELF, n, &vecLocal);CHKERRQ(ierr);

/* Create IS to describe where we want to scatter to */
ierr = ISCreateStride(PETSC_COMM_SELF, n, 0, 1, &isLocal);CHKERRQ(ierr);

/* Compute the global indices */
ierr = VecGetSize(vecGlobal, &N);CHKERRQ(ierr);
ierr = PetscObjectGetComm((PetscObject) vecGlobal, &comm);CHKERRQ(ierr);
ierr = MPI_Comm_rank(comm, &r);CHKERRMPI(ierr);
ierr = MPI_Comm_size(comm, &R);CHKERRMPI(ierr);
firstGlobalIndex = r == R-1 ? 0 : (N/R)*(r+1);

/* Create IS that describes where we want to scatter from */
ierr = ISCreateStride(comm, n, firstGlobalIndex, 1, &isGlobal);CHKERRQ(ierr);

/* Create the VecScatter context */
ierr = VecScatterCreate(vecGlobal, isGlobal, vecLocal, isLocal, &ctx);CHKERRQ(ierr);

/* Gather the values */
ierr = VecScatterBegin(ctx, vecGlobal, vecLocal, INSERT_VALUES, SCATTER_FORWARD);CHKERRQ(ierr);
ierr = VecScatterEnd(ctx, vecGlobal, vecLocal, INSERT_VALUES, SCATTER_FORWARD);CHKERRQ(ierr);

/* Retrieve and do work */
ierr = VecGetArray(vecLocal, &arr);CHKERRQ(ierr);
/* Work */
ierr = VecRestoreArray(vecLocal, &arr);CHKERRQ(ierr);

/* Don't forget to clean up */
ierr = ISDestroy(&isLocal);CHKERRQ(ierr);
ierr = ISDestroy(&isGlobal);CHKERRQ(ierr);
ierr = VecScatterDestroy(&ctx);CHKERRQ(ierr);
ierr = VecDestroy(&vecLocal);CHKERRQ(ierr);


### How do I collect to a single processor all the values from a parallel PETSc Vec?¶

1. Create the VecScatter context that will do the communication:

Vec            in_par, out_seq;
VecScatter     ctx;
PetscErrorCode ierr;

ierr = VecScatterCreateToAll(in_par, &ctx, &out_seq);CHKERRQ(ierr);

2. Initiate the communication (this may be repeated if you wish):

ierr = VecScatterBegin(ctx, in_par, out_seq, INSERT_VALUES, SCATTER_FORWARD);CHKERRQ(ierr);
ierr = VecScatterEnd(ctx, in_par, out_seq, INSERT_VALUES, SCATTER_FORWARD);CHKERRQ(ierr);
/* May destroy context now if no additional scatters are needed, otherwise reuse it */
ierr = VecScatterDestroy(&ctx);CHKERRQ(ierr);


Note that this simply concatenates in the parallel ordering of the vector (computed by the MPI_Comm_rank of the owning process). If you are using a Vec from DMCreateGlobalVector() you likely want to first call DMDAGlobalToNaturalBegin() followed by DMDAGlobalToNaturalEnd() to scatter the original Vec into the natural ordering in a new global Vec before calling VecScatterBegin()/VecScatterEnd() to scatter the natural Vec onto all processes.

### How do I collect all the values from a parallel PETSc Vec on the 0th rank?¶

See FAQ entry on collecting to an arbitrary processor, but replace

ierr = VecScatterCreateToAll(in_par, &ctx, &out_seq);CHKERRQ(ierr);


with

ierr = VecScatterCreateToZero(in_par, &ctx, &out_seq);CHKERRQ(ierr);


Note

The same ordering considerations as discussed in the aforementioned entry also apply here.

### How can I read in or write out a sparse matrix in Matrix Market, Harwell-Boeing, Slapc or other ASCII format?¶

If you can read or write your matrix using Python or MATLAB/Octave, PetscBinaryIO modules are provided at $PETSC_DIR/lib/petsc/bin for each language that can assist with reading and writing. If you just want to convert MatrixMarket, you can use: > python -m$PETSC_DIR/lib/petsc/bin/PetscBinaryIO convert matrix.mtx


To produce matrix.petsc.

You can also call the script directly or import it from your Python code. There is also a PETScBinaryIO.jl Julia package.

### When I dump some matrices and vectors to binary, I seem to be generating some empty files with .info extensions. What’s the deal with these?¶

PETSc binary viewers put some additional information into .info files like matrix block size. It is harmless but if you really don’t like it you can use -viewer_binary_skip_info or PetscViewerBinarySkipInfo().

Note

You need to call PetscViewerBinarySkipInfo() before PetscViewerFileSetName(). In other words you cannot use PetscViewerBinaryOpen() directly.

### Why is my parallel solver slower than my sequential solver, or I have poor speed-up?¶

This can happen for many reasons:

1. Make sure it is truly the time in KSPSolve() that is slower (by running the code with -log_view). Often the slower time is in generating the matrix or some other operation.

2. There must be enough work for each process to overweigh the communication time. We recommend an absolute minimum of about 10,000 unknowns per process, better is 20,000 or more.

3. Make sure the communication speed of the parallel computer is good enough for parallel solvers.

4. Check the number of solver iterates with the parallel solver against the sequential solver. Most preconditioners require more iterations when used on more processes, this is particularly true for block Jacobi (the default parallel preconditioner). You can try -pc_type asm (PCASM) its iterations scale a bit better for more processes. You may also consider multigrid preconditioners like PCMG or BoomerAMG in PCHYPRE.

### What steps are necessary to make the pipelined solvers execute efficiently?¶

Pipelined solvers like KSPPGMRES, KSPPIPECG, KSPPIPECR, and KSPGROPPCG may require special MPI configuration to effectively overlap reductions with computation. In general, this requires an MPI-3 implementation, an implementation that supports multiple threads, and use of a “progress thread”. Consult the documentation from your vendor or computing facility for more.

Cray MPI

Cray MPT-5.6 supports MPI-3, but setting $MPICH_MAX_THREAD_SAFETY to “multiple” for threads, plus either $MPICH_ASYNC_PROGRESS or $MPICH_NEMESIS_ASYNC_PROGRESS. E.g. > export MPICH_MAX_THREAD_SAFETY=multiple > export MPICH_ASYNC_PROGRESS=1 > export MPICH_NEMESIS_ASYNC_PROGRESS=1  MPICH MPICH version 3.0 and later implements the MPI-3 standard and the default configuration supports use of threads. Use of a progress thread is configured by setting the environment variable $MPICH_ASYNC_PROGRESS. E.g.

> export MPICH_ASYNC_PROGRESS=1


### When using PETSc in single precision mode (--with-precision=single when running configure) are the operations done in single or double precision?¶

PETSc does NOT do any explicit conversion of single precision to double before performing computations; it depends on the hardware and compiler for what happens. For example, the compiler could choose to put the single precision numbers into the usual double precision registers and then use the usual double precision floating point unit. Or it could use SSE2 instructions that work directly on the single precision numbers. It is a bit of a mystery what decisions get made sometimes. There may be compiler flags in some circumstances that can affect this.

### Why is Newton’s method (SNES) not converging, or converges slowly?¶

Newton’s method may not converge for many reasons, here are some of the most common:

• The Jacobian is wrong (or correct in sequential but not in parallel).

• The linear system is not solved or is not solved accurately enough.

• The Jacobian system has a singularity that the linear solver is not handling.

• There is a bug in the function evaluation routine.

• The function is not continuous or does not have continuous first derivatives (e.g. phase change or TVD limiters).

• The equations may not have a solution (e.g. limit cycle instead of a steady state) or there may be a “hill” between the initial guess and the steady state (e.g. reactants must ignite and burn before reaching a steady state, but the steady-state residual will be larger during combustion).

Here are some of the ways to help debug lack of convergence of Newton:

• Run on one processor to see if the problem is only in parallel.

• Run with -info to get more detailed information on the solution process.

• Run with the options

-snes_monitor -ksp_monitor_true_residual -snes_converged_reason -ksp_converged_reason

• If the linear solve does not converge, check if the Jacobian is correct, then see this question.

• If the preconditioned residual converges, but the true residual does not, the preconditioner may be singular.

• If the linear solve converges well, but the line search fails, the Jacobian may be incorrect.

• Run with -pc_type lu or -pc_type svd to see if the problem is a poor linear solver.

• Run with -mat_view or -mat_view draw to see if the Jacobian looks reasonable.

• Run with -snes_test_jacobian -snes_test_jacobian_view to see if the Jacobian you are using is wrong. Compare the output when you add -mat_fd_type ds to see if the result is sensitive to the choice of differencing parameter.

• Run with -snes_mf_operator -pc_type lu to see if the Jacobian you are using is wrong. If the problem is too large for a direct solve, try

-snes_mf_operator -pc_type ksp -ksp_ksp_rtol 1e-12.


Compare the output when you add -mat_mffd_type ds to see if the result is sensitive to choice of differencing parameter.

• Run with -snes_linesearch_monitor to see if the line search is failing (this is usually a sign of a bad Jacobian). Use -info in PETSc 3.1 and older versions, -snes_ls_monitor in PETSc 3.2 and -snes_linesearch_monitor in PETSc 3.3 and later.

Here are some ways to help the Newton process if everything above checks out:

• Run with grid sequencing (-snes_grid_sequence if working with a DM is all you need) to generate better initial guess on your finer mesh.

• Run with quad precision, i.e.

> ./configure --with-precision=__float128 --download-f2cblaslapack


Note

• Change the units (nondimensionalization), boundary condition scaling, or formulation so that the Jacobian is better conditioned. See Buckingham pi theorem and Dimensional and Scaling Analysis.

• Mollify features in the function that do not have continuous first derivatives (often occurs when there are “if” statements in the residual evaluation, e.g. phase change or TVD limiters). Use a variational inequality solver (SNESVINEWTONRSLS) if the discontinuities are of fundamental importance.

• Try a trust region method (-ts_type tr, may have to adjust parameters).

• Run with some continuation parameter from a point where you know the solution, see TSPSEUDO for steady-states.

• There are homotopy solver packages like PHCpack that can get you all possible solutions (and tell you that it has found them all) but those are not scalable and cannot solve anything but small problems.

### Why is the linear solver (KSP) not converging, or converges slowly?¶

Tip

Always run with -ksp_converged_reason -ksp_monitor_true_residual when trying to learn why a method is not converging!

Common reasons for KSP not converging are:

• A symmetric method is being used for a non-symmetric problem.

• The equations are singular by accident (e.g. forgot to impose boundary conditions). Check this for a small problem using -pc_type svd -pc_svd_monitor.

• The equations are intentionally singular (e.g. constant null space), but the Krylov method was not informed, see MatSetNullSpace(). Always inform your local Krylov subspace solver of any change of singularity. Failure to do so will result in the immediate revocation of your computing and keyboard operator licenses, as well as a stern talking-to by the nearest Krylov Subspace Method representative.

• The equations are intentionally singular and MatSetNullSpace() was used, but the right hand side is not consistent. You may have to call MatNullSpaceRemove() on the right hand side before calling KSPSolve(). See MatSetTransposeNullSpace().

• The equations are indefinite so that standard preconditioners don’t work. Usually you will know this from the physics, but you can check with

-ksp_compute_eigenvalues -ksp_gmres_restart 1000 -pc_type none


For simple saddle point problems, try

-pc_type fieldsplit -pc_fieldsplit_type schur -pc_fieldsplit_detect_saddle_point


For more difficult problems, read the literature to find robust methods and ask petsc-users@mcs.anl.gov or petsc-maint@mcs.anl.gov if you want advice about how to implement them.

• If the method converges in preconditioned residual, but not in true residual, the preconditioner is likely singular or nearly so. This is common for saddle point problems (e.g. incompressible flow) or strongly nonsymmetric operators (e.g. low-Mach hyperbolic problems with large time steps).

• The preconditioner is too weak or is unstable. See if -pc_type asm -sub_pc_type lu improves the convergence rate. If GMRES is losing too much progress in the restart, see if longer restarts help -ksp_gmres_restart 300. If a transpose is available, try -ksp_type bcgs or other methods that do not require a restart.

Note

Unfortunately convergence with these methods is frequently erratic.

• The preconditioner is nonlinear (e.g. a nested iterative solve), try -ksp_type fgmres or -ksp_type gcr.

• You are using geometric multigrid, but some equations (often boundary conditions) are not scaled compatibly between levels. Try -pc_mg_galerkin both to algebraically construct a correctly scaled coarse operator or make sure that all the equations are scaled in the same way if you want to use rediscretized coarse levels.

• The matrix is very ill-conditioned. Check the condition number.

• Try to improve it by choosing the relative scaling of components/boundary conditions.

• Try -ksp_diagonal_scale -ksp_diagonal_scale_fix.

• Perhaps change the formulation of the problem to produce more friendly algebraic equations.

• Change the units (nondimensionalization), boundary condition scaling, or formulation so that the Jacobian is better conditioned. See Buckingham pi theorem and Dimensional and Scaling Analysis.

• Classical Gram-Schmidt is becoming unstable, try -ksp_gmres_modifiedgramschmidt or use a method that orthogonalizes differently, e.g. -ksp_type gcr.

### I get the error message: Actual argument at (1) to assumed-type dummy is of derived type with type-bound or FINAL procedures¶

Use the following code-snippet:

module context_module
#include petsc/finclude/petsc.h
use petsc
implicit none
private
type, public ::  context_type
private
PetscInt :: foo
contains
procedure, public :: init => context_init
end type context_type
contains
subroutine context_init(self, foo)
class(context_type), intent(in out) :: self
PetscInt, intent(in) :: foo
self%foo = foo
end subroutine context_init
end module context_module

!------------------------------------------------------------------------

program test_snes
use iso_c_binding
use petsc
use context_module
implicit none

SNES :: snes
type(context_type),target :: context
type(c_ptr) :: contextOut
PetscErrorCode :: ierr

call PetscInitialize(PETSC_NULL_CHARACTER, ierr)
call SNESCreate(PETSC_COMM_WORLD, snes, ierr)
call context%init(1)

contextOut = c_loc(context) ! contextOut is a C pointer on context

call SNESSetConvergenceTest(snes, convergence, contextOut, PETSC_NULL_FUNCTION, ierr)

call SNESDestroy(snes, ierr)
call PetscFinalize(ierr)

contains

subroutine convergence(snes, num_iterations, xnorm, pnorm,fnorm, reason, contextIn, ierr)
SNES, intent(in) :: snes

PetscInt, intent(in) :: num_iterations
PetscReal, intent(in) :: xnorm, pnorm, fnorm
SNESConvergedReason, intent(out) :: reason
type(c_ptr), intent(in out) :: contextIn
type(context_type), pointer :: context
PetscErrorCode, intent(out) :: ierr

call c_f_pointer(contextIn,context)  ! convert the C pointer to a Fortran pointer to use context as in the main program
reason = 0
ierr = 0
end subroutine convergence
end program test_snes


### In C++ I get a crash on VecDestroy() (or some other PETSc object) at the end of the program¶

This can happen when the destructor for a C++ class is automatically called at the end of the program after PetscFinalize(). Use the following code-snippet:

main()
{
PetscErrorCode ierr;

ierr = PetscInitialize();if (ierr) {return ierr;}
{

...   /* all your destructors are called here automatically by C++ so they work correctly */
}
ierr = PetscFinalize();if (ierr) {return ierr;}
return 0
}


## Debugging¶

### How do I turn off PETSc signal handling so I can use the -C Option On xlf?¶

Immediately after calling PetscInitialize() call PetscPopSignalHandler().

Some Fortran compilers including the IBM xlf, xlF etc compilers have a compile option (-C for IBM’s) that causes all array access in Fortran to be checked that they are in-bounds. This is a great feature but does require that the array dimensions be set explicitly, not with a *.

### How do I debug if -start_in_debugger does not work on my machine?¶

The script https://github.com/Azrael3000/tmpi can be used to run multiple MPI ranks in the debugger using tmux.

On newer macOS machines - one has to be in admin group to be able to use debugger.

On newer Ubuntu linux machines - one has to disable ptrace_scope with

> sudo echo 0 > /proc/sys/kernel/yama/ptrace_scope


to get start in debugger working.

If -start_in_debugger does not really work on your OS, for a uniprocessor job, just try the debugger directly, for example: gdb ex1. You can also use TotalView which is a good graphical parallel debugger.

### How do I see where my code is hanging?¶

You can use the -start_in_debugger option to start all processes in the debugger (each will come up in its own xterm) or run in TotalView. Then use cont (for continue) in each xterm. Once you are sure that the program is hanging, hit control-c in each xterm and then use ‘where’ to print a stack trace for each process.

### How can I inspect PETSc vector and matrix values when in the debugger?¶

I will illustrate this with gdb, but it should be similar on other debuggers. You can look at local Vec values directly by obtaining the array. For a Vec v, we can print all local values using:

(gdb) p ((Vec_Seq*) v->data)->array[0]@v->map.n


However, this becomes much more complicated for a matrix. Therefore, it is advisable to use the default viewer to look at the object. For a Vec v and a Mat m, this would be:

(gdb) call VecView(v, 0)
(gdb) call MatView(m, 0)


or with a communicator other than MPI_COMM_WORLD:

(gdb) call MatView(m, PETSC_VIEWER_STDOUT_(m->comm))


Totalview 8.8.0+ has a new feature that allows libraries to provide their own code to display objects in the debugger. Thus in theory each PETSc object, Vec, Mat etc could have custom code to print values in the object. We have only done this for the most elementary display of Vec and Mat. See the routine TV_display_type() in src/vec/vec/interface/vector.c for an example of how these may be written. Contact us if you would like to add more.

### How can I find the cause of floating point exceptions like not-a-number (NaN) or infinity?¶

The best way to locate floating point exceptions is to use a debugger. On supported architectures (including Linux and glibc-based systems), just run in a debugger and pass -fp_trap to the PETSc application. This will activate signaling exceptions and the debugger will break on the line that first divides by zero or otherwise generates an exceptions.

Without a debugger, running with -fp_trap in debug mode will only identify the function in which the error occurred, but not the line or the type of exception. If -fp_trap is not supported on your architecture, consult the documentation for your debugger since there is likely a way to have it catch exceptions.

### Error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory¶

The Intel compilers use shared libraries (like libimf) that cannot by default at run time. When using the Intel compilers (and running the resulting code) you must make sure that the proper Intel initialization scripts are run. This is usually done by putting some code into your .cshrc, .bashrc, .profile etc file. Sometimes on batch file systems that do now access your initialization files (like .cshrc) you must include the initialization calls in your batch file submission.

For example, on my Mac using csh I have the following in my .cshrc file:

source /opt/intel/cc/10.1.012/bin/iccvars.csh
source /opt/intel/fc/10.1.012/bin/ifortvars.csh
source /opt/intel/idb/10.1.012/bin/idbvars.csh


And in my .profile I have

source /opt/intel/cc/10.1.012/bin/iccvars.sh
source /opt/intel/fc/10.1.012/bin/ifortvars.sh
source /opt/intel/idb/10.1.012/bin/idbvars.sh


### What does “Object Type Not Set: Argument # N” Mean?¶

Many operations on PETSc objects require that the specific type of the object be set before the operations is performed. You must call XXXSetType() or XXXSetFromOptions() before you make the offending call. For example

Mat            A;
PetscErrorCode ierr;

ierr = MatCreate(PETSC_COMM_WORLD, &A);CHKERRQ(ierr);
ierr = MatSetValues(A,....);CHKERRQ(ierr);


will not work. You must add MatSetType() or MatSetFromOptions() before the call to MatSetValues(). I.e.

Mat            A;
PetscErrorCode ierr;

ierr = MatCreate(PETSC_COMM_WORLD, &A);CHKERRQ(ierr);

ierr = MatSetType(A, MATAIJ);CHKERRQ(ierr);
/* Will override MatSetType() */
ierr = MatSetFromOptions();CHKERRQ(ierr);

ierr = MatSetValues(A,....);CHKERRQ(ierr);


### What does error detected in PetscSplitOwnership() about “sum of local lengths …”: mean?¶

In a previous call to VecSetSizes(), MatSetSizes(), VecCreateXXX() or MatCreateXXX() you passed in local and global sizes that do not make sense for the correct number of processors. For example if you pass in a local size of 2 and a global size of 100 and run on two processors, this cannot work since the sum of the local sizes is 4, not 100.

### What does “corrupt argument” or “caught signal” Or “SEGV” Or “segmentation violation” Or “bus error” mean? Can I use Valgrind or CUDA-Memcheck to debug memory corruption issues?¶

Sometimes it can mean an argument to a function is invalid. In Fortran this may be caused by forgetting to list an argument in the call, especially the final PetscErrorCode.

Otherwise it is usually caused by memory corruption; that is somewhere the code is writing out of array bounds. To track this down rerun the debug version of the code with the option -malloc_debug. Occasionally the code may crash only with the optimized version, in that case run the optimized version with -malloc_debug. If you determine the problem is from memory corruption you can put the macro CHKMEMQ in the code near the crash to determine exactly what line is causing the problem.

If -malloc_debug does not help: on NVIDIA CUDA systems you can use https://docs.nvidia.com/cuda/cuda-memcheck/index.html

If -malloc_debug does not help: on GNU/Linux and (supported) macOS machines - you can use valgrind. Follow the below instructions:

1. configure PETSc with --download-mpich --with-debugging.

2. On macOS you need to:

1. use valgrind from https://github.com/LouisBrunner/valgrind-macos. Follow the Usage instructions in the README.md on that page (no need to clone the repository).

2. use the additional configure options --download-fblaslapack or --download-f2cblaslapack

3. use the additional valgrind option --dsymutil=yes

3. Compile your application code with this build of PETSc.

4. Run with valgrind.

> $PETSC_DIR/lib/petsc/bin/petscmpiexec -valgrind -n NPROC PETSCPROGRAMNAME PROGRAMOPTIONS  or > mpiexec -n NPROC valgrind --tool=memcheck -q --num-callers=20 \ --suppressions=$PETSC_DIR/share/petsc/valgrind/petsc-val.supp \
--log-file=valgrind.log.%p PETSCPROGRAMNAME -malloc off PROGRAMOPTIONS


Note

• option --with-debugging enables valgrind to give stack trace with additional source-file:line-number info.

• option --download-mpich is valgrind clean, other MPI builds are not valgrind clean.

• when --download-mpich is used - mpiexec will be in \$PETSC_ARCH/bin

• --log-file=valgrind.log.%p option tells valgrind to store the output from each process in a different file [as %p i.e PID, is different for each MPI process.

• memcheck will not find certain array access that violate static array declarations so if memcheck runs clean you can try the --tool=exp-ptrcheck instead.

You might also consider using http://drmemory.org which has support for GNU/Linux, Apple Mac OS and Microsoft Windows machines. (Note we haven’t tried this ourselves).

### What does “detected zero pivot in LU factorization” mean?¶

A zero pivot in LU, ILU, Cholesky, or ICC sparse factorization does not always mean that the matrix is singular. You can use

-pc_factor_shift_type nonzero -pc_factor_shift_amount [amount]


or

-pc_factor_shift_type positive_definite -[level]_pc_factor_shift_type nonzero
-pc_factor_shift_amount [amount]


or

-[level]_pc_factor_shift_type positive_definite


to prevent the zero pivot. [level] is “sub” when lu, ilu, cholesky, or icc are employed in each individual block of the bjacobi or ASM preconditioner. [level] is “mg_levels” or “mg_coarse” when lu, ilu, cholesky, or icc are used inside multigrid smoothers or to the coarse grid solver. See PCFactorSetShiftType(), PCFactorSetShiftAmount().

This error can also happen if your matrix is singular, see MatSetNullSpace() for how to handle this. If this error occurs in the zeroth row of the matrix, it is likely you have an error in the code that generates the matrix.

### You create draw windows or ViewerDraw windows or use options -ksp_monitor draw::draw_lg or -snes_monitor draw::draw_lg and the program seems to run OK but windows never open¶

The libraries were compiled without support for X windows. Make sure that configure was run with the option --with-x.

### The program seems to use more and more memory as it runs, even though you don’t think you are allocating more memory¶

Some of the following may be occurring:

• You are creating new PETSc objects but never freeing them.

• There is a memory leak in PETSc or your code.

• Something much more subtle: (if you are using Fortran). When you declare a large array in Fortran, the operating system does not allocate all the memory pages for that array until you start using the different locations in the array. Thus, in a code, if at each step you start using later values in the array your virtual memory usage will “continue” to increase as measured by ps or top.

• You are running with the -log, -log_mpe, or -log_all option. With these options, a great deal of logging information is stored in memory until the conclusion of the run.

• You are linking with the MPI profiling libraries; these cause logging of all MPI activities. Another symptom is at the conclusion of the run it may print some message about writing log files.

The following may help:

• Run with the -malloc_debug option and -malloc_view. Or use PetscMallocDump() and PetscMallocView() sprinkled about your code to track memory that is allocated and not later freed. Use the commands PetscMallocGetCurrentUsage() and PetscMemoryGetCurrentUsage() to monitor memory allocated and PetscMallocGetMaximumUsage() and PetscMemoryGetMaximumUsage() for total memory used as the code progresses.

• This is just the way Unix works and is harmless.

• Do not use the -log, -log_mpe, or -log_all option, or use PetscLogEventDeactivate() or PetscLogEventDeactivateClass() to turn off logging of specific events.

• Make sure you do not link with the MPI profiling libraries.

### When calling MatPartitioningApply() you get a message “Error! Key 16615 Not Found”¶

The graph of the matrix you are using is not symmetric. You must use symmetric matrices for partitioning.

### With GMRES at restart the second residual norm printed does not match the first¶

I.e.

26 KSP Residual norm 3.421544615851e-04
27 KSP Residual norm 2.973675659493e-04
28 KSP Residual norm 2.588642948270e-04
29 KSP Residual norm 2.268190747349e-04
30 KSP Residual norm 1.977245964368e-04
30 KSP Residual norm 1.994426291979e-04 <----- At restart the residual norm is printed a second time


This is actually not surprising! GMRES computes the norm of the residual at each iteration via a recurrence relation between the norms of the residuals at the previous iterations and quantities computed at the current iteration. It does not compute it via directly $$|| b - A x^{n} ||$$.

Sometimes, especially with an ill-conditioned matrix, or computation of the matrix-vector product via differencing, the residual norms computed by GMRES start to “drift” from the correct values. At the restart, we compute the residual norm directly, hence the “strange stuff,” the difference printed. The drifting, if it remains small, is harmless (doesn’t affect the accuracy of the solution that GMRES computes).

If you use a more powerful preconditioner the drift will often be smaller and less noticeable. Of if you are running matrix-free you may need to tune the matrix-free parameters.

### Why do some Krylov methods seem to print two residual norms per iteration?¶

I.e.

1198 KSP Residual norm 1.366052062216e-04
1198 KSP Residual norm 1.931875025549e-04
1199 KSP Residual norm 1.366026406067e-04
1199 KSP Residual norm 1.931819426344e-04


Some Krylov methods, for example KSPTFQMR, actually have a “sub-iteration” of size 2 inside the loop. Each of the two substeps has its own matrix vector product and application of the preconditioner and updates the residual approximations. This is why you get this “funny” output where it looks like there are two residual norms per iteration. You can also think of it as twice as many iterations.

### Unable to locate PETSc dynamic library libpetsc¶

When using DYNAMIC libraries - the libraries cannot be moved after they are installed. This could also happen on clusters - where the paths are different on the (run) nodes - than on the (compile) front-end. Do not use dynamic libraries & shared libraries. Run configure with --with-shared-libraries=0 --with-dynamic-loading=0.

Important

This option has been removed in petsc v3.5

### How do I determine what update to PETSc broke my code?¶

If at some point (in PETSc code history) you had a working code - but the latest PETSc code broke it, its possible to determine the PETSc code change that might have caused this behavior. This is achieved by:

• Using Git to access PETSc sources

• Knowing the Git commit for the known working version of PETSc

• Knowing the Git commit for the known broken version of PETSc

• Using the bisect functionality of Git

Git bisect can be done as follows:

1. Get PETSc development (main branch in git) sources

> git clone https://gitlab.com/petsc/petsc.git

2. Find the good and bad markers to start the bisection process. This can be done either by checking git log or gitk or https://gitlab.com/petsc/petsc or the web history of petsc-release clones. Lets say the known bad commit is 21af4baa815c and known good commit is 5ae5ab319844.

3. Start the bisection process with these known revisions. Build PETSc, and test your code to confirm known good/bad behavior:

> git bisect start 21af4baa815c 5ae5ab319844


build/test, perhaps discover that this new state is bad

> git bisect bad


build/test, perhaps discover that this state is good

> git bisect good


Now until done - keep bisecting, building PETSc, and testing your code with it and determine if the code is working or not. After something like 5-15 iterations, git bisect will pin-point the exact code change that resulted in the difference in application behavior.

Tip

See git-bisect(1) and the debugging section of the Git Book for more debugging tips.

### How to fix the error “PMIX Error: error in file gds_ds12_lock_pthread.c”?¶

This seems to be an error when using OpenMPI and OpenBLAS with threads (or perhaps other packages that use threads).

> export PMIX_MCA_gds=hash


Should resolve the problem.

## Shared Libraries¶

### Can I install PETSc libraries as shared libraries?¶

Yes. Use

> ./configure --with-shared-libraries


### Why should I use shared libraries?¶

When you link to shared libraries, the function symbols from the shared libraries are not copied in the executable. This way the size of the executable is considerably smaller than when using regular libraries. This helps in a couple of ways:

• Saves disk space when more than one executable is created

• Improves the compile time immensly, because the compiler has to write a much smaller file (executable) to the disk.

### What do I do if I want to move my executable to a different machine?¶

You would also need to have access to the shared libraries on this new machine. The other alternative is to build the exeutable without shared libraries by first deleting the shared libraries, and then creating the executable.

ESW14

Howard C. Elman, David J. Silvester, and Andrew J. Wathen. Finite elements and fast iterative solvers: with applications in incompressible fluid dynamics. Oxford University Press, 2014. URL: https://global.oup.com/academic/product/finite-elements-and-fast-iterative-solvers-9780199678792?cc=us&lang=en&.