Questions for developers:
1. why memory for hash is not allocated in GPU shared memory but in global ?
2. can OpenMP be used to parallelize tasks on CPU ? e.g. validation.
3. did anyone thought of splitting communication host<->GPU into 2 separate threads ? basically one thread constantly and asynchronously reads data from stratum/pool and copies it to an array located in GPU, GPU kernel does all calculations and save it to an array @GPU, second thread constantly and asynchronously reads calculated data from an array @GPU to an array @host, validates them @CPU and sends it over the network to the pool. the idea is to avoid GPU kernel from waiting for host.
what more experienced developers think about those ideas ? I am lousy programmer so do not expect proof of concept from my side soon :-(