Skip to content

cuda_graph

argsort(iterable, key)

Sort the list of tensors following provided lambda function. :param iterable: iterable object to sort :param key: lambda function to sort the iterable object :return: indices to sort the iterable object

cuda_graphs_wrapper(model, inputs)

Wrapper to run the model with cuda graphs. @param model: model to save as a CUDA graph @param inputs: inputs to the model @return: an inference function that runs the model with cuda graphs

get_pool_size(inputs, existing_pools)

Get the size of the pool to use for the CUDA graphs: - pool size should be at least as big as the largest existing pool size - if pool size < 1Gb, increase its size up to next power of 2 to avoid having many unusuable small pools

:param inputs: list of inputs to be copied in the pool :param existing_pools: list of existing pools :return: size of the pool in bytes

prepare_inputs(inputs, pools)

Copy the inputs in the CUDA graphs memory pool and return tensor copies. Follows a greedy bin packing algorithm (first-fit decreasing) to minimize the number of pools: - sort the items in decreasing order of size ; - insert them one by one into the first bin that has room for it.

:param inputs: list of tensors to copy in the pool :param pools: list of available pools :return: copy of input tensors having their underlying storage in the memory pool