https://docs.nvidia.com/cuda/nvrtc/index.html#supported-compile-options It would be useful to be able to pass additional compiler flags to NVRTC to control e.g., floating point math. The CCCL.c layer already provides this facility, we just need to plumb it through to the user. The API could look something like: ```python set_nvrtc_compile_options({"fmad": "true"}) ``` Or on a per algorithm basis: ```python reducer = make_reduce_into(d_in, d_out, op, h_init, num_items, nvrtc_compile_options={"fmad": "true"}) ```