Monitoring Tools ================ Bifrost provides some command-line tools for monitoring the performance of running pipelines. In the ``tools/`` directory there are *(aka seeing C in Python pipelines)* Bifrost is based on two maxims: 1) by itself, C sucks. 2) by itself, Python sucks. Bifrost bridges the two languages using some clever python libraries, a customized numpy array class, and enforcing some pipeline friendly design choices. NVIDIA Profiler --------------- The NVIDIA Profiler and Visual Profiler tools (part of the CUDA Toolkit) can be used to profile and visualize Bifrost pipelines. Applications can be launched directly from the Visual Profiler (nvvp), or a profile can first be generated using the nvprof command line tool: .. code:: $ nvprof -o my_pipeline.nvprof python my_pipeline.py The generated .nvprof file can then be imported into the Visual Profiler for visualisation and analysis. To obtain a more detailed profile of pipeline execution, reconfigure and rebuild the bifrost library with "trace" enabled using `./configure --enable-trace`. Pipeline in /dev/shm -------------------- Details about the currently running bifrost pipeline are available in the ``/dev/shm`` directory on Linux. They are mapped into a directory structure (use the linux ``tree`` utility to view it): .. code:: dancpr@bldcpr:/bldata/bifrost/tools$ tree /dev/shm/bifrost /dev/shm/bifrost └── 17263 └── Pipeline_0 ├── AccumulateBlock_0 │   ├── bind │   ├── in │   ├── out │   ├── perf │   └── sequence0 ├── BlockScope_1 │   ├── PrintHeaderBlock_0 │   │   ├── bind │   │   ├── in │   │   ├── out │   │   ├── perf │   │   └── sequence0 │   └── TransposeBlock_0 │   ├── bind │   ├── in │   ├── out │   ├── perf │   └── sequence0 ├── BlockScope_13 ├... like_top.py ----------- The main performance monitoring tools is ``like_top.py``. This is, as the name suggests, like the linux utility ``top``. ..code:: like_top.py - bldcpr - load average: 0.59, 0.14, 0.05 Processes: 516 total, 1 running CPU(s): 1.9%us, 1.4%sy, 0.0%ni, 84.5%id, 12.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 32341840k total, 19834116k used, 12507724k free, 515556k buffers Swap: 32938492k total, 767408k used, 32171084k free, 17982316k cached PID Block Core %CPU Total Acquire Process Reserve Cmd 19154 GuppiRawSourceB 0 9.4 0.714 0.000 0.714 0.000 python ./bf_gpuspec_midres.py ../pulsa 19154 FftBlock_0 3 4.4 0.733 0.699 0.034 0.000 python ./bf_gpuspec_midres.py ../pulsa 19154 CopyBlock_0 2 4.4 0.722 0.700 0.021 0.000 python ./bf_gpuspec_midres.py ../pulsa 19154 TransposeBlock_ 1 3.5 0.710 0.695 0.015 0.000 python ./bf_gpuspec_midres.py ../pulsa 19154 HdfWriteBlock_0 6 0.4 3.220 3.213 0.007 0.000 python ./bf_gpuspec_midres.py ../pulsa 19154 DetectBlock_0 4 1.0 0.738 0.733 0.005 0.000 python ./bf_gpuspec_midres.py ../pulsa 19154 FftShiftBlock_0 3 4.4 0.738 0.734 0.005 0.000 python ./bf_gpuspec_midres.py ../pulsa 19154 CopyBlock_1 6 0.4 2.816 2.813 0.003 0.000 python ./bf_gpuspec_midres.py ../pulsa 19154 AccumulateBlock 5 4.0 0.005 0.005 0.001 0.000 python ./bf_gpuspec_midres.py ../pulsa 19154 PrintHeaderBloc -1 3.220 3.220 0.000 0.000 python ./bf_gpuspec_midres.py ../pulsa * Acquire is the time spent waiting for input (i.e., waiting on upstream blocks), * Process is the time spent processing data, and * Reserve is the time spent waiting for output space to become available in the ring (i.e., waiting for downstream blocks). Note: The CPU fraction will probably be 100% on any GPU block because it's currently set to spin (busy loop) while waiting for the GPU.