Debugging CUDA can be a challenge on Windows. When a program crashes due to invalid memory access, the NSight debugger does not show where the error occurs. All CUDA threads are exited and no output is produced. You can try to use printf within the CUDA kernel to narrow down where and why the kernel is crashing, but there is another way.
The CUDA GPU Toolkit provides a tool called cuda-memcheck that provides a traceback of the crash. It works with either a console or GUI application.
Example: In your kernel, add a parameter “float * A”. In the host (CPU), pass an array of 100 elements to the kernel. Add the code “A[1090000] = 0;” to the kernel. Run the program, and it should cause a crash with an illegal memory exception.
“ErrorIllegalAddress: While executing a kernel, the device encountered a load or store instruction on an invalid memory address.
This leaves the process in an inconsistent state and any further CUDA work will return the same error.”
To use cuda-memcheck, try the following steps:
a) Use cuda-memcheck on your program from a command-line shell.
b) Recreate crash. Observe output from cuda-memcheck, and look for output “at 0xYYYYY in ZZZZZZ:KERNEL”, where YYYYY is the hex offset of the exception, ZZZZZZ is the full path of the CUDA source file for your kernel “KERNEL”.
c) Debug your program in VS using NSight Legacy debugger. Set a breakpoint at the beginning of your kernel. Run to breakpoint. Right click on the source code and select “Go to Disassembly”. In Address text box, enter KERNEL+0xYYYYY (substituting the name for your kernel in KERNEL, and the offset reported by cuda-memcheck in 0xYYYYY). The instruction of the crash is shown in the disassembly.