You will notice that you get back 4 results, although the job has 5 winning nonces. That is because the A1 has an output queue of 4 elements, so one of the results is overwritten. To get them all, you need to pull the results early enough while chip is still hashing. Have a look at the reference driver to check how to do this continuously.
Thank you! I had indeed switched over to cgminer but getting the test to run correctly makes me feel a lot better about the status of my hardware.

Interestingly, I get six nonces? The four you post, the one I assume that was overwritten and an extra. Same results for all four chips (haven't tried the second board).
************** Got nonce! 18 8d b1 99
************** Got nonce! 3a a6 b2 0c
************** Got nonce! 3f 8f 64 de
************** Got nonce! b9 9c c7 09
************** Got nonce! be 7b 58 b3
************** Got nonce! ec c1 4e 74
What you get back from the chip is a valid Diff1 share, while your pool is obviously asking for higher difficulty shares. That's absolutely normal, i.e. you will see this trace log with every HW that produces Diff1 shares. cgminer then drops all those below pool's difficulty.
OK, perfect. I figured the target difficulty wasn't right given it's fixed in the job creation function. I will integrate your new code at some point. Until then, you're correct: getting confirmation of things working is very helpful!
First, however, it's time to do some reliability testing with the test vector
