Post
Topic
Board Mining software (miners)
Re: Official CGMINER thread - CPU/GPU miner in C for linux/windows/osx
by
d3m0n1q_733rz
on 31/07/2011, 00:26:38 UTC
Hey, I've been working on the hashing asm, as I said before, by removing redundancies of functions and register moves, using logic to modify source and destinations to take advantage of processor hardware optimizations and doing some of the easy math myself so the processor doesn't have to.  Here's what I've done so far.  It's not much, but it works.  Don't go changing the github source just yet though.  For now, copy-paste this to replace your existing sha256_sse4_amd64.asm file.  For those of you without SSE4.1 (such as AMD users), copy paste this into you sse2_amd64 file instead and search-replace all uses of movntdqa with movdqa so the quick memory moves aren't used.

I pasted your ASM into sha256_xmm_amd64.asm and changed "movntdqa" to "movdqa" like you said for sse2. But I get a linker error.
Code:
...
cgminer-sha256_sse2_amd64.o: In function `scanhash_sse2_64':
sha256_sse2_amd64.c:(.text+0x4fb): undefined reference to `CalcSha256_x64'
sha256_sse2_amd64.c:(.text+0x50b): undefined reference to `CalcSha256_x64'
collect2: ld returned 1 exit status
...

I had to change "CalcSha256_x64_sse4" to "CalcSha256_x64" in two spots. Then the compile went just fine. I'm running now to see if it's any faster and if any work actually gets accepted bu t hopefully it's bug free.

btw, doesn't the assembler do basic inline math before assembling?

P.S. Hashrate looks really close to the same but I did get a work unit accepted just now.

EDIT: so the increase in speed, if any, is around 1% increase maybe slightly more. I only have two cores at 3.5 Mh/s each so it's hard to see the difference on the scale of Mhash/s.

Admittedly, there won't be much of a speed improvement just yet as I haven't really gone after the main loop.  The vast majority of changes I've made only apply to just before the work is inserted into the loop.  Also, leaving things to the assembler to do with the assumption that it will do it tends to leave room for problems to occur.  Sometimes, a change that you think will take place doesn't and ends up adding to the CPU instructions to calculate.  Often best to head them off before hand.