7. Fast GPU math using bfMap¶
A key under-the-hood part of bifrost
is the map
function, that
provides a simple way to do fast arithmetic operations on the GPU. To
get acquainted, we will ignore all of the pipeline infrastructure that
bifrost provides, and just call the map function directly.
Let’s suppose you have two ndarrays
and wish to add them together on
the GPU. Here is how you would do that with map
:
import bifrost as bf
# Create two arrays on the GPU, A and B, and an empty output C
a = bf.ndarray([1,2,3,4,5], space='cuda')
b = bf.ndarray([1,0,1,0,1], space='cuda')
c = bf.ndarray(np.zeros(5), space='cuda')
# Add them together
bf.map("c = a + b", data={'c': c, 'a': a, 'b': b})
print c
# ndarray([ 2., 2., 4., 4., 6.])
The map function figures out what to do based on the string you give it. These would also work:
bf.map("c = a - b", data={'c': c, 'a': a, 'b': b})
bf.map("c = a * b", data={'c': c, 'a': a, 'b': b})
bf.map("c = a / b", data={'c': c, 'a': a, 'b': b})
bf.map("c += a + b", data={'c': c, 'a': a, 'b': b})
bf.map("c /= a + b", data={'c': c, 'a': a, 'b': b})
bf.map("c *= a + b", data={'c': c, 'a': a, 'b': b})
Getting deeper requires a look at the docstring:
def map(func_string, *args, **kwargs):
"""Apply a function to a set of ndarrays.
Args:
func_string (str): The function to apply to the arrays, as a string (see
below for examples).
data (dict): Map of string names to ndarrays or scalars.
axis_names (list): List of string names by which each axis is referenced
in func_string.
shape: The shape of the computation. If None, the broadcast shape
of all data arrays is used.
func_name (str): Name of the function, for debugging purposes.
extra_code (str): Additional code to be included at global scope.
block_shape: The 2D shape of the thread block (y,x) with which the kernel
is launched.
This is a performance tuning parameter.
If NULL, a heuristic is used to select the block shape.
Changes to this parameter do _not_ require re-compilation of
the kernel.
block_axes: List of axis indices (or names) specifying the 2 computation
axes to which the thread block (y,x) is mapped.
This is a performance tuning parameter.
If NULL, a heuristic is used to select the block axes.
Values may be negative for reverse indexing.
Changes to this parameter _do_ require re-compilation of the
kernel.
Note:
Only GPU computation is currently supported.
Examples::
# Add two arrays together
bf.map("c = a + b", {'c': c, 'a': a, 'b': b})
# Compute outer product of two arrays
bf.map("c(i,j) = a(i) * b(j)",
{'c': c, 'a': a, 'b': b},
axis_names=('i','j'))
# Split the components of a complex array
bf.map("a = c.real; b = c.imag", {'c': c, 'a': a, 'b': b})
# Raise an array to a scalar power
bf.map("c = pow(a, p)", {'c': c, 'a': a, 'p': 2.0})
# Slice an array with a scalar index
bf.map("c(i) = a(i,k)", {'c': c, 'a': a, 'k': 7}, ['i'], shape=c.shape)
"""
Let’s look a bit closer at that outer product example. Here, by convention of summation notation, the indexes ‘i’, ‘j’ on the two arrays A and B, create an outer product. A full example:
import bifrost as bf
# Create two arrays on the GPU, A and B, and an empty output C
a = bf.ndarray([1,2,3,4,5], space='cuda')
b = bf.ndarray([1,0,1,0,1], space='cuda')
c = bf.ndarray(np.zeros((5, 5)), space='cuda')
# Compute outer product
bf.map("c(i,j) = a(i) * b(j)",
axis_names=['i', 'j'],
data={'c': c, 'a': a, 'b': b})
print c
# ndarray([[ 1., 0., 1., 0., 1.],
# [ 2., 0., 2., 0., 2.],
# [ 3., 0., 3., 0., 3.],
# [ 4., 0., 4., 0., 4.],
# [ 5., 0., 5., 0., 5.]])
The first example of c = a + b
could also be written more explicitly as:
bf.map("c(i) = a(i) + b(i)", axis_names=['i'], data={'c': c, 'a': a, 'b': b})
Note, however, that implicit indexing should be preferred where possible, as explicit indexing may exhibit worse performance.