Hey guys, can anyone point me in the right direction for C++ with gpu optimization and cuda integration/implementation?
C++ won't help you, the GPU doesn't care about any fancy object oriented programming. Besides, C++ isn't like a magical unicorn that solves everything and is a perfect language, anyone who lost more than 20 years learning it will definitely tell you they are still learning it. If they don't, they are lying or are unaware that there's always something new to learn about it. Which makes them really bad developers.
To learn what you want, it all really depends on your background. Try this:
1. Run a Hello world program that prints from the GPU.
2. Decompose everything that happened before delving into hacking advanced cryptographic problems:
a. Understand what the host code did (and draw the line between your C++ or whatever code and the GPU code)
b. Understand what the GPU did - dump the CUBIN, check the SASS ops, learn about the registers, shared memory, constant memory, basically read the manual.
c. Upgrade your hello world program into a different programming language (maybe Python) but load and run the exact same kernel as the first time. Notice how it has nothing to do with any C++ this time around, since you're loading a CUBIN into the GPU.
d. Now you're a pro - apply for lead tech jobs at emerging de-fi startups. Just kidding. You're on your way to fight with compiler bugs, unexpected behaviours, unslept nights scrolling through NVidia forums, and profiling performance of a dozen variations of algorithms to see which one runs better.
Good luck.