7. Fast GPU math using bfMap

A key under-the-hood part of bifrost is the map function, that provides a simple way to do fast arithmetic operations on the GPU. To get acquainted, we will ignore all of the pipeline infrastructure that bifrost provides, and just call the map function directly.

Let’s suppose you have two ndarrays and wish to add them together on the GPU. Here is how you would do that with map:

import bifrost as bf

# Create two arrays on the GPU, A and B, and an empty output C
a = bf.ndarray([1,2,3,4,5], space='cuda')
b = bf.ndarray([1,0,1,0,1], space='cuda')
c = bf.ndarray(np.zeros(5), space='cuda')

# Add them together
bf.map("c = a + b", data={'c': c, 'a': a, 'b': b})
print c
# ndarray([ 2.,  2.,  4.,  4.,  6.])

The map function figures out what to do based on the string you give it. These would also work:

bf.map("c = a - b", data={'c': c, 'a': a, 'b': b})
bf.map("c = a * b", data={'c': c, 'a': a, 'b': b})
bf.map("c = a / b", data={'c': c, 'a': a, 'b': b})
bf.map("c += a + b", data={'c': c, 'a': a, 'b': b})
bf.map("c /= a + b", data={'c': c, 'a': a, 'b': b})
bf.map("c *= a + b", data={'c': c, 'a': a, 'b': b})

Getting deeper requires a look at the docstring:

def map(func_string, *args, **kwargs):
    """Apply a function to a set of ndarrays.

    Args:
      func_string (str): The function to apply to the arrays, as a string (see
                   below for examples).
      data (dict): Map of string names to ndarrays or scalars.
      axis_names (list): List of string names by which each axis is referenced
                   in func_string.
      shape:       The shape of the computation. If None, the broadcast shape
                   of all data arrays is used.
      func_name (str): Name of the function, for debugging purposes.
      extra_code (str): Additional code to be included at global scope.
      block_shape: The 2D shape of the thread block (y,x) with which the kernel
                   is launched.
                   This is a performance tuning parameter.
                   If NULL, a heuristic is used to select the block shape.
                   Changes to this parameter do _not_ require re-compilation of
                   the kernel.
      block_axes:  List of axis indices (or names) specifying the 2 computation
                   axes to which the thread block (y,x) is mapped.
                   This is a performance tuning parameter.
                   If NULL, a heuristic is used to select the block axes.
                   Values may be negative for reverse indexing.
                   Changes to this parameter _do_ require re-compilation of the
                   kernel.

    Note:
        Only GPU computation is currently supported.

    Examples::

      # Add two arrays together
      bf.map("c = a + b", {'c': c, 'a': a, 'b': b})

      # Compute outer product of two arrays
      bf.map("c(i,j) = a(i) * b(j)",
             {'c': c, 'a': a, 'b': b},
             axis_names=('i','j'))

      # Split the components of a complex array
      bf.map("a = c.real; b = c.imag", {'c': c, 'a': a, 'b': b})

      # Raise an array to a scalar power
      bf.map("c = pow(a, p)", {'c': c, 'a': a, 'p': 2.0})

      # Slice an array with a scalar index
      bf.map("c(i) = a(i,k)", {'c': c, 'a': a, 'k': 7}, ['i'], shape=c.shape)
    """

Let’s look a bit closer at that outer product example. Here, by convention of summation notation, the indexes ‘i’, ‘j’ on the two arrays A and B, create an outer product. A full example:

import bifrost as bf

# Create two arrays on the GPU, A and B, and an empty output C
a = bf.ndarray([1,2,3,4,5], space='cuda')
b = bf.ndarray([1,0,1,0,1], space='cuda')
c = bf.ndarray(np.zeros((5, 5)), space='cuda')

# Compute outer product
bf.map("c(i,j) = a(i) * b(j)",
       axis_names=['i', 'j'],
       data={'c': c, 'a': a, 'b': b})
print c

# ndarray([[ 1.,  0.,  1.,  0.,  1.],
#          [ 2.,  0.,  2.,  0.,  2.],
#          [ 3.,  0.,  3.,  0.,  3.],
#          [ 4.,  0.,  4.,  0.,  4.],
#          [ 5.,  0.,  5.,  0.,  5.]])

The first example of c = a + b could also be written more explicitly as:

bf.map("c(i) = a(i) + b(i)", axis_names=['i'], data={'c': c, 'a': a, 'b': b})

Note, however, that implicit indexing should be preferred where possible, as explicit indexing may exhibit worse performance.