Start timer for device
When CUDA or HIP is enabled, the timer is run on the GPU, it is a separate logging of time devoted to GPU computations (excluding kernel launch times).
When CUDA or HIP is not available, the timer is run on the CPU, it is a separate logging of time devoted to GPU computations (including kernel launch times).
This timer should NOT include times for data transfers between the GPU and CPU, nor setup actions such as allocating space.
The regular logging captures the time for data transfers and any CPU activites during the event
It is used to compute the flop rate on the GPU as it is actively engaged in running a kernel.
The GPU event timer captures the execution time of all the kernels launched in the default stream by the CPU between
PetsLogGpuTimeEnd() insert the begin and end events into the default stream (stream 0). The device will record a time stamp for the
event when it reaches that event in the stream. The function xxxEventSynchronize() is called in
PetsLogGpuTimeEnd() to block CPU execution,
but not continued GPU excution, until the timer event is recorded.