What are your program flags; your -d setting and range? Too low and you will get many collisions/dead kangas and your speed will also be much lower.
I am using the default settings, and a 40-bit range from the bundled sample in.txt file.
Yeah, that is the problem, GPUs are not made for the 40 bit range, a CPU can get through that in about 1 second...You need to bump up your range to like an 80 bit range and let the GPU open up. Change your input text to this:
800000000000000000000
FFFFFFFFFFFFFFFFFFFFFFF
037e1238f7b1ce757df94faa9a2eb261bf0aeb9f84dbf81212104e78931c2a19dc
and rerun your program