Here is loaded file with new code
So it takes now about 6sec to transfer the kangaroos, try with a timeout of ~10sec. I don't know if cuda can do parallel transfer on multiple GPU...
I'll add a way to set the up the timeout in the next release and see if I can speed up more this data transfer...