Hello,
I spent several days optimizing Scrypt OpenCL code. It was quite challenging, because my primary work and hobby is low-level optimization of code, especially the cryptographic one. SHA-256 is very familiar to me, in particular, I contributed in SHA-256 assembler optimization of OpenSSL code.
Concerning the current Scrypt OpenCL code, it's quite perfect. My first implementation was 10 times slower than that! Anyway, finally I've achieved a little speed-up. I have tested it in few AMD GPUs under Windows and latest AMD drivers (13.11), and the results are as follows:
HD 6770 and HD 7950 - 2-3%
HD 7770 - no change
R9 280x - haven't seen any changes in -g 2 mode, but there is again 2% in -g 1.
Here is the new OpenCL code:
http://www.crark.net/download/scrypt130511.zip Instructions:
0) Save your scrypt130511.cl file
1) Unzip and copy it (overwriting) to the cgminer folder. If your cgminer uses another filename (like scrypt130302.cl), rename file to this name.
All cgminer 3.x versions should be supported.
2) Delete all *.bin files (like scrypt130511Tahitiglg2tc8192w256l4.bin).
3) Restart cgminer and enjoy. All your previous settings, like lookup-gap and thread-concurrency should not be changed.
Please let me know of the results you were able to get, including your hardware and drivers version.
If you like my work, please donate BTC or LTC.
SY, Pavel Semjanov.
The modifications seem to be working well for me. After some config tweaks, I went from 640 KH/s at best to 670 KH/s w/Sapphire Vapor-X 7950 w/boost. My efficiency and shares/min seem to have noticeably improved as well. Here's my cgminer config for anyone interested:
"api-allow" : "W:127.0.0.1",
"api-listen" : true,
"expiry" : "3",
"log" : "5",
"queue" : "2",
"scan-time" : "1",
"scrypt" : true,
"kernel" : "scrypt",
"auto-fan" : true,
"gpu-threads" : "1",
"gpu-engine" : "1150",
"gpu-memclock" : "1500",
"intensity" : "19",
"temp-target" : "70",
"temp-overheat" : "85",
"temp-cutoff" : "95",
"temp-hysteresis" : "3",
"gpu-powertune" : "20",
"gpu-vddc" : "1.25",
"worksize" : "256",
"lookup-gap" : "2",
"shaders" : "1792",
"vectors" : "1",
"thread-concurrency" : "21712"
Still playing around with the gpu-vddc setting, but something about these GPUs don't seem to like anything less than 1.25 regardless of what I set my clocks at (I've set them way lower hoping to be able to set the voltage lower, but they always end up crashing). One day when I'm feeling more ambitious I may try to flash the GPU bios. I have also played around with the thread-concurrency quite a bit, but 21712 seems to be the sweet spot.
To the OP: Unfortunately I don't have any LTC or BTC atm - I had to sell most of them a couple of months ago to cover other expenses and just recently started mining again. What little funds I have right now are invested in other coins. Happen to have a TAG, WDC or NXT address? I would be happy to send a few over to you if so.