6. Monitoring Tools¶
Bifrost provides some command-line tools for monitoring the performance of running pipelines.
In the tools/
directory there are
(aka seeing C in Python pipelines)
Bifrost is based on two maxims: 1) by itself, C sucks. 2) by itself, Python sucks.
Bifrost bridges the two languages using some clever python libraries, a customized numpy array class, and enforcing some pipeline friendly design choices.
6.1. NVIDIA Profiler¶
The NVIDIA Profiler and Visual Profiler tools (part of the CUDA Toolkit) can be used to profile and visualize Bifrost pipelines. Applications can be launched directly from the Visual Profiler (nvvp), or a profile can first be generated using the nvprof command line tool:
$ nvprof -o my_pipeline.nvprof python my_pipeline.py
The generated .nvprof file can then be imported into the Visual Profiler for visualisation and analysis.
To obtain a more detailed profile of pipeline execution, reconfigure and rebuild the bifrost library with “trace” enabled using ./configure –enable-trace.
6.2. Pipeline in /dev/shm¶
Details about the currently running bifrost pipeline are available in the /dev/shm
directory on Linux. They are mapped into a directory structure (use the linux tree
utility to view it):
dancpr@bldcpr:/bldata/bifrost/tools$ tree /dev/shm/bifrost
/dev/shm/bifrost
└── 17263
└── Pipeline_0
├── AccumulateBlock_0
│ ├── bind
│ ├── in
│ ├── out
│ ├── perf
│ └── sequence0
├── BlockScope_1
│ ├── PrintHeaderBlock_0
│ │ ├── bind
│ │ ├── in
│ │ ├── out
│ │ ├── perf
│ │ └── sequence0
│ └── TransposeBlock_0
│ ├── bind
│ ├── in
│ ├── out
│ ├── perf
│ └── sequence0
├── BlockScope_13
├...
6.3. like_top.py¶
The main performance monitoring tools is like_top.py
. This is, as the name suggests, like the linux utility top
.
..code:
like_top.py - bldcpr - load average: 0.59, 0.14, 0.05
Processes: 516 total, 1 running
CPU(s): 1.9%us, 1.4%sy, 0.0%ni, 84.5%id, 12.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32341840k total, 19834116k used, 12507724k free, 515556k buffers
Swap: 32938492k total, 767408k used, 32171084k free, 17982316k cached
PID Block Core %CPU Total Acquire Process Reserve Cmd
19154 GuppiRawSourceB 0 9.4 0.714 0.000 0.714 0.000 python ./bf_gpuspec_midres.py ../pulsa
19154 FftBlock_0 3 4.4 0.733 0.699 0.034 0.000 python ./bf_gpuspec_midres.py ../pulsa
19154 CopyBlock_0 2 4.4 0.722 0.700 0.021 0.000 python ./bf_gpuspec_midres.py ../pulsa
19154 TransposeBlock_ 1 3.5 0.710 0.695 0.015 0.000 python ./bf_gpuspec_midres.py ../pulsa
19154 HdfWriteBlock_0 6 0.4 3.220 3.213 0.007 0.000 python ./bf_gpuspec_midres.py ../pulsa
19154 DetectBlock_0 4 1.0 0.738 0.733 0.005 0.000 python ./bf_gpuspec_midres.py ../pulsa
19154 FftShiftBlock_0 3 4.4 0.738 0.734 0.005 0.000 python ./bf_gpuspec_midres.py ../pulsa
19154 CopyBlock_1 6 0.4 2.816 2.813 0.003 0.000 python ./bf_gpuspec_midres.py ../pulsa
19154 AccumulateBlock 5 4.0 0.005 0.005 0.001 0.000 python ./bf_gpuspec_midres.py ../pulsa
19154 PrintHeaderBloc -1 3.220 3.220 0.000 0.000 python ./bf_gpuspec_midres.py ../pulsa
Acquire is the time spent waiting for input (i.e., waiting on upstream blocks),
Process is the time spent processing data, and
Reserve is the time spent waiting for output space to become available in the ring (i.e., waiting for downstream blocks).
Note: The CPU fraction will probably be 100% on any GPU block because it’s currently set to spin (busy loop) while waiting for the GPU.