Search content
Sort by

Showing 20 of 77 results by BOARBEAR
Post
Topic
Board Altcoin Discussion
Re: Ripple Giveaway!
by
BOARBEAR
on 11/05/2013, 10:26:35 UTC
rphpkn1zmukkNVhjgbxWXZ31pVgWtATVC4
Post
Topic
Board Mining
Topic OP
how is GCN architecture for mining?
by
BOARBEAR
on 08/06/2012, 22:06:37 UTC
I have seen people saying that the GCN architecture is not good at mining comparing to VLIW architecture.
The 7970 seems to do quite well in mining.
Made me wonder, from architecture point of view, is it really a bad architecture for mining?  Can it be better than the old architecture?
Post
Topic
Board CPU/GPU Bitcoin mining hardware
Re: How should I cool my 5970 VRMs?
by
BOARBEAR
on 17/02/2012, 05:36:57 UTC
A simply test shows it does not.

870Mhz  397Mhash/s 870/397=2.191
800Mhz 359Mhash/s 800/359=2.222
500Mhz 202Mhash/s 500/202=2.475

In theory it can't be linear, same reason why cpu does not scale linearly with core speed.
Don't spread false information

I agree YOU should not spread false information.

Your numbers are garbage so likely you have suboptimal settings and the most likely culprit is memclock.

Do those results seem plausible to you. 


870Mhz  397Mhash/s 397/870= 0.41 MH per Mhz
800Mhz 359Mhash/s 359/800 = 0.44 MH per Mhz
500Mhz 202Mhash/s 202/500 = 0.46 MH per Mhz

So the card is getting more efficient at higher clock (and likely higher temp)?  Does that seem plausible to you?

Likely you normally run @ 870Mhz and have found a more optimizes memclock.  You sloppily moved core clock without modifying memclock introducing timing delays making the card less effective the further you move it from 870 Mhz.

For the record @ 500 MHz I get 225 MH (0.450 MH/MHz) and @ 820 I get 375 MH/s (0.457 MH/MHz).


Your calculation is garbage.
202/500=0.404 not 0.46
397/870=0.45 not 0.41
These are just the multiplicative inverse of the number I gave above.
And I tested all above in identical situation except of the core frequency.
Post
Topic
Board CPU/GPU Bitcoin mining hardware
Re: How should I cool my 5970 VRMs?
by
BOARBEAR
on 17/02/2012, 01:06:31 UTC
Performance does not scale linearly with clock speed at all.  (try 500mhz and 800mhz and you will see)

Of course it does. Unless perhaps if you are using SDK 2.6.
A simply test shows it does not.

870Mhz  397Mhash/s 870/397=2.191
800Mhz 359Mhash/s 800/359=2.222
500Mhz 202Mhash/s 500/202=2.475

In theory it can't be linear, same reason why cpu does not scale linearly with core speed.
Don't spread false information
Post
Topic
Board CPU/GPU Bitcoin mining hardware
Re: How should I cool my 5970 VRMs?
by
BOARBEAR
on 16/02/2012, 07:40:09 UTC
Well, no. Voltage is only one part of power usage, higher clocked cards use more current than lower clocked cards at the same voltage. That's why overclocked cards get hotter even when not overvolted.

Yes, power consumption scales linearly with clockspeed. But performance also scales linearly with clockspeed.
Higher performance at equal MH/W for the cards, means higher efficiency for the overall rig since CPU, MB, RAM, etc remain the same.

Power consumption scales quadratically with voltage. Clockspeed -and  thus performance- does not. Therefore undervolting almost always gives better MH/W, but overclocking is completely sane from an efficiency POV.
Performance does not scale linearly with clock speed at all.  (try 500mhz and 800mhz and you will see)
Post
Topic
Board Mining software (miners)
Re: CGMINER GPU bitforce overclock monitor fanspeed RPC in C linux/windows/osx 2.2.3
by
BOARBEAR
on 07/02/2012, 12:51:00 UTC
Debugging requires:
-D --verbose -T
as the README file says...
[2012-02-07 07:49:21] Started cgminer 2.2.3                    
[2012-02-07 07:49:21] CL Platform 0 vendor: Advanced Micro Devices, Inc.                    
[2012-02-07 07:49:21] CL Platform 0 name: AMD Accelerated Parallel Processing                    
[2012-02-07 07:49:21] CL Platform 0 version: OpenCL 1.1 AMD-APP (898.1)                    
[2012-02-07 07:49:21] Platform 0 devices: 1                    
[2012-02-07 07:49:21] Testing pool http://127.0.0.1:8332                    
[2012-02-07 07:49:21] Popping work to stage thread                    
[2012-02-07 07:49:21] Popping work to work thread                    
[2012-02-07 07:49:21] Successfully retrieved and deciphered work from pool 0 http://127.0.0.1:8332                    
[2012-02-07 07:49:21] Pushing pooltest work to base pool                    
[2012-02-07 07:49:21] Pool 0 http://127.0.0.1:8332 active                    
[2012-02-07 07:49:21] Pushing ping to longpoll thread                    
[2012-02-07 07:49:21] Pushing work to getwork[20 1q2u-e0u2e- 0 7   0 7 : 4 9 : 2 1 ]   P u s h i n g
 ping to thread 0                    
[2012-02-07 07:49:21] Popping work to stage th[r2e0a1d2 - 0 2 - 0 7   0 7 : 4 9 : 2 1 ]   I n i
t GPU thread 0 GPU 0 virtual GPU 0                    
[2012-02-07 07:49:21] CL Platform vendor: Advanced Micro Devices, Inc.                    
[2012-02-07 07:49:21] CL Platform name: AMD Accelerated Parallel Processing                    
[2012-02-07 07:49:21] CL Platform version: OpenCL 1.1 AMD-APP (898.1)                    
[2012-02-07 07:49:21] List of devices:                    
[2012-02-07 07:49:[2210]1 2N-o0 2l-o0n7g -0p7o:l4l9 :f2o1u]n d    0o   nC yapnrye spso o l   s e r v e r                      
        
[2012-02-07 07:49:21] Selected 0: Cypress                    
[2012-02-07 07:49:21] Preferred vector width reported 4                    
[2012-02-07 07:49:21] Max work group size reported 256                    
[2012-02-07 07:49:21] No binary found, generating from source                    
[2012-02-07 07:49:21] Setting worksize to 64                    
[2012-02-07 07:49:21] Patched source to suit 4 vectors                    
[2012-02-07 07:49:21] cl_amd_media_ops found, setting BITALIGN                    
[2012-02-07 07:49:21] BFI_INT patch requiring device found, patched source with BFI_INT                    
[2012-02-07 07:49:21] CompilerOptions: -D WORKSIZE=64 -D VECTORS4 -D BITALIGN -D BFI_INT                    
[2012-02-07 07:49:21] Error: Building Program (clBuildProgram)                    
[2012-02-07 07:49:21]                    
[2012-02-07 07:49:21] Failed to init GPU thread 0, disabling device 0                    
[2012-02-07 07:49:21] Restarting the GPU from the menu will not fix this.                    
[2012-02-07 07:49:21] Try restarting cgminer.                    
[2012-02-07 07:49:21] Init GPU thread 1 GPU 0 virtual GPU 0                    
[2012-02-07 07:49:21] CL Platform vendor: Advanced Micro Devices, Inc.                    
[2012-02-07 07:49:21] CL Platform name: AMD Accelerated Parallel Processing                    
[2012-02-07 07:49:21] CL Platform version: OpenCL 1.1 AMD-APP (898.1)                    
[2012-02-07 07:49:21] List of devices:                    
[2012-02-07 07:49:21]    0   Cypress                    
[2012-02-07 07:49:21] Selected 0: Cypress                    
[2012-02-07 07:49:21] Preferred vector width reported 4                    
[2012-02-07 07:49:21] Max work group size reported 256                    
[2012-02-07 07:49:21] No binary found, generating from source                    
[2012-02-07 07:49:21] Setting worksize to 64                    
[2012-02-07 07:49:21] Patched source to suit 4 vectors                    
[2012-02-07 07:49:21] cl_amd_media_ops found, setting BITALIGN                    
[2012-02-07 07:49:21] BFI_INT patch requiring device found, patched source with BFI_INT                    
[2012-02-07 07:49:21] CompilerOptions: -D WORKSIZE=64 -D VECTORS4 -D BITALIGN -D BFI_INT                    
[2012-02-07 07:49:21] Error: Building Program

cgminer crashed
Post
Topic
Board Mining software (miners)
Re: CGMINER GPU bitforce overclock monitor fanspeed RPC in C linux/windows/osx 2.2.3
by
BOARBEAR
on 07/02/2012, 12:13:47 UTC
Presuming (on windows) you ran a previous version and it worked and the new version doesn't ...

The output of use would be "cgminer -n" and "cgminer -D -T ..." (where ... are the other options you normally use)
[2012-02-07 07:12:29] Started cgminer 2.2.3                   
[2012-02-07 07:12:29] CL Platform 0 vendor: Advanced Micro Devices, Inc.                   
[2012-02-07 07:12:29] CL Platform 0 name: AMD Accelerated Parallel Processing                   
[2012-02-07 07:12:29] CL Platform 0 version: OpenCL 1.1 AMD-APP (898.1)                   
[2012-02-07 07:12:29] Platform 0 devices: 1                   
[2012-02-07 07:12:29] Testing pool http://127.0.0.1:8332                   
[2012-02-07 07:12:29] Popping work to stage thread                   
[2012-02-07 07:12:29] Popping work to work thread                   
[2012-02-07 07:12:29] Successfully retrieved and deciphered work from pool 0 http://127.0.0.1:8332                   
[2012-02-07 07:12:29] Pushing pooltest work to base pool                   
[2012-02-07 07:12:29] Pool 0 http://127.0.0.1:8332 active                   
[2012-02-07 07:12:29] Pushing ping to longpoll thread                   
[2012-02-07 07:12:29] Pushing work to getwork [2que0u1e2 - 0 2 - 0 7   0 7 : 1 2 : 2 9 ]   P u s
hing ping to thread 0                   
[2012-02-07 07:12:29] Popping work to stage thre[a2d0 1 2 - 0 2 - 0 7   0 7 : 1 2 : 2 9 ]   I
nit GPU thread 0 GPU 0 virtual GPU 0                   
[2012-02-07 07:12:29] CL Platform vendor: Advanced Micro Devices, Inc.                   
[2012-02-07 07:12:29] CL Platform name: AMD Accelerated Parallel Processing                   
[2012-02-07 07:12:29] CL Platform version: OpenCL 1.1 AMD-APP (898.1)                   
[2012-02-07 07:12:29] List of devices:                   
[2012-02-07 07:12:[2290]1 2N-o0 2l-o0n7g -0p7o:l1l2 :f2o9u]n d    0o   nC yapnrye spso o l   s e r v e r                     
         
[2012-02-07 07:12:29] Selected 0: Cypress                   
[2012-02-07 07:12:29] Preferred vector width reported 4                   
[2012-02-07 07:12:29] Max work group size reported 256                   
[2012-02-07 07:12:29] No binary found, generating from source                   
[2012-02-07 07:12:29] Setting worksize to 64                   
[2012-02-07 07:12:29] Patched source to suit 4 vectors                   
[2012-02-07 07:12:29] cl_amd_media_ops found, setting BITALIGN                   
[2012-02-07 07:12:29] BFI_INT patch requiring device found, patched source with BFI_INT                   
[2012-02-07 07:12:29] CompilerOptions: -D WORKSIZE=64 -D VECTORS4 -D BITALIGN -D BFI_INT                   
[2012-02-07 07:12:29] Error: Building Program (clBuildProgram)                   
[2012-02-07 07:12:29]                     
[2012-02-07 07:12:29] Failed to init GPU thread 0, disabling device 0                   
[2012-02-07 07:12:29] Restarting the GPU from the menu will not fix this.                   
[2012-02-07 07:12:29] Try restarting cgminer.                   
[2012-02-07 07:12:29] Init GPU thread 1 GPU 0 virtual GPU 0                   
[2012-02-07 07:12:29] CL Platform vendor: Advanced Micro Devices, Inc.                   
[2012-02-07 07:12:29] CL Platform name: AMD Accelerated Parallel Processing                   
[2012-02-07 07:12:29] CL Platform version: OpenCL 1.1 AMD-APP (898.1)                   
[2012-02-07 07:12:29] List of devices:                   
[2012-02-07 07:12:29]    0   Cypress                   
[2012-02-07 07:12:29] Selected 0: Cypress                   
[2012-02-07 07:12:29] Preferred vector width reported 4                   
[2012-02-07 07:12:29] Max work group size reported 256                   
[2012-02-07 07:12:29] No binary found, generating from source                   
[2012-02-07 07:12:29] Setting worksize to 64                   
[2012-02-07 07:12:29] Patched source to suit 4 vectors                   
[2012-02-07 07:12:29] cl_amd_media_ops found, setting BITALIGN                   
[2012-02-07 07:12:29] BFI_INT patch requiring device found, patched source with BFI_INT                   
[2012-02-07 07:12:29] CompilerOptions: -D WORKSIZE=64 -D VECTORS4 -D BITALIGN -D BFI_INT                   
[2012-02-07 07:12:29] Error: Building Program
Post
Topic
Board Mining software (miners)
Re: CGMINER GPU bitforce overclock monitor fanspeed RPC in C linux/windows/osx 2.2.3
by
BOARBEAR
on 06/02/2012, 13:40:14 UTC
I cant start miner now with new version

:

     -02-06 08








[2012-02-06 08:39:17] Failed to init GPU thread 0, disabling device 0
[2012-02-06 08:39:17] Restarting the GPU from the menu will not fix this.
[2012-02-06 08:39:17] Try restarting cgminer.
Press enter to continue:
39:17] No long-poll found on any pool server



Post
Topic
Board Mining software (miners)
Re: CGMINER GPU bitforce overclock monitor fanspeed RPC in C linux/windows/osx 2.2.0
by
BOARBEAR
on 29/01/2012, 21:54:19 UTC
i have 2 GPU, one 5870 and oone 3xxx
And this newer version does not work on my setup, used to work fine.
Post
Topic
Board Mining software (miners)
Re: cat 12.2 preview is out
by
BOARBEAR
on 27/01/2012, 16:37:34 UTC
100% cpu bug is back omg!
rolling back to 11.12.
Post
Topic
Board Mining software (miners)
Re: Modified Kernel for Phoenix 1.5
by
BOARBEAR
on 24/01/2012, 04:38:31 UTC
Note: new sdk version works best work worksize 64 for 5870
Post
Topic
Board CPU/GPU Bitcoin mining hardware
Re: Ufasoft Miner 0.25 - Windows/Linux, x86/x64, SSE2/OpenCL/CUDA, Open Source
by
BOARBEAR
on 17/01/2012, 15:07:56 UTC
Something wrong with the miner version 0.25
It was working fine with earlier version

Now I get "Found NONCE not accepted by Target"
Post
Topic
Board Mining
Re: Want legit 7970 testing/benchmarking and tuning for cgminer and Diablominer?
by
BOARBEAR
on 05/01/2012, 00:55:06 UTC
i will be pretty upset if the 7xxx comes out only marginally better for mining. Because i just sold both 5970 at $300 a pop on ebay getting ready for the 7xxx series. Miner is down right now waiting for the new card to release.

Why would you do that?  The discussion has been for months that GCN architecture may not be mining friendly.  

Your sold 5970s deliver 1.4 to 1.5GH.  You do realize that it is a pipe dream to think the 7970 will be ANYWHERE near that.  Worse at launch the card is likely going to sell at $549 and even then availability may be difficult. 

Disablo may be able to pull some respectable performance out of the card (eventually) but your sale was downright foolish.  If the 7970 could someday deliver 500 MH that would be decent.  600 MH+ would be amazing.  It isn't going to deliver 1.5GH.  You can't optimize something out of nothing.
He did not say 7970.  He could wait for a 7890 or something that uses VLIW4 which are better at mining. 
Post
Topic
Board Mining software (miners)
Re: BitMinter miner (FAST, cool GUI, zero installation, Windows/Linux/Mac)
by
BOARBEAR
on 28/12/2011, 20:14:53 UTC
I would try this miner if it was not written in Java.  Any chance of making one in C?

I might make a command-line version in the future, but it will most likely also run on a Java Virtual Machine. Why do you prefer C?

Does this miner work on other pools, or just your own ? What about backup pool support ?

So far only in my pool. Support for backup pools is planned, but everything takes time.

I don't use any other Java program.  So I don't have Java installed on my system.  And I don't wanna install Java just for one miner.
I prefer anything that's not Java, not just C.
Post
Topic
Board Mining software (miners)
Re: BitMinter miner (FAST, cool GUI, zero installation, Windows/Linux/Mac)
by
BOARBEAR
on 26/12/2011, 03:36:18 UTC
I would try this miner if it was not written in Java.  Any chance of making one in C?
Post
Topic
Board Mining software (miners)
Re: *Catalyst 12.1 Preview* Decreased performance, anyone else confirm?
by
BOARBEAR
on 22/12/2011, 23:24:36 UTC
Oh btw there are two different versions of 12.1 preview

The one I tried is this:
http://developer.amd.com/Downloads/OpenCL1.2-Static-Cplus-preview-drivers-Windows.exe

It has a newer openCL than the other 12.1 preview
Post
Topic
Board Mining software (miners)
Re: CGMINER CPU/GPU miner overclock monitor fanspeed in C linux/windows/osx 2.0.8
by
BOARBEAR
on 22/12/2011, 22:56:15 UTC
anyone know how i can use cgminer with phatk 2.1?
Post
Topic
Board Mining software (miners)
Re: *Catalyst 12.1 Preview* Decreased performance, anyone else confirm?
by
BOARBEAR
on 22/12/2011, 15:43:09 UTC
For those who got less hash with 12.1 with cgminer

Try worksize 64 with vectors 4
Post
Topic
Board Mining software (miners)
Re: Modified Kernel for Phoenix 1.5
by
BOARBEAR
on 16/08/2011, 15:20:46 UTC
It seems like your latest kernel and mine have problems if BFI_INT gets forced of via (BFI_INT=false) ... it seems the results are invalid every time.
Any idea Phateus?

Perhaps #define Ch(x, y, z) bitselect(x, y, z) is not right?

Edit and solved, non BFI_INT Ch has to be:
Code:
#define Ch(x, y, z) bitselect(z, y, x)

If you want to thank someone, you can donate to 1LY4hGSY6rRuL7BQ8cjUhP2JFHFrPp5JVe (Vince -> who did a GREAT job during my kernel development)!

Dia

Awesome, thank you!  I was under the assumption that BFI_INT and bitselect were the same operation, apparently, the operand order is different.  I will fix it in my next release.

Thank you everyone for your support (both in BTC and discussion).

I should have a drop-in version of the kernel available for cgminer soon, so anyone wanting to try out the pre-release, I'll be posting it tonight.

@BOARBEAR
*sigh*.... come on man... do you even read my posts? There is no single cause of the bad performance.  2.2 executes less instructions and uses less registers than 2.1, but as I said... there is some weird issue which makes openCL slower behind the scenes.  My best guess is that it has to do with register allocation. 

The GPU has a total of 256x32x4 registers (8192 UINT4).  At the most, there are 256 threads per workgroup (8192/256 = 32 registers per thread).  Using VECTORS, the number of registers is far below this number, therefore the hardware can operate on the maximum allowable threads at a time.  However, when you compile with VECTORS4, there is more than 32 registers per thread.  OpenCL must determine how to allocate the threads, and the utilization of the video card is sub-optimal)  Below is a diagram of what I think is going on.


4 thread groups running simultaneously VECTORS (2 running at a time)
[1111111122222222]
[3333333344444444]

using an optimal version of VECTORS4, it would look much like this (double the work is done per thread)
[1111111111111111]
[2222222222222222]
[3333333333333333]
[4444444444444444]

now making it use slightly less resources will make it slower because the threads are out of sync and there will be overhead in syncing and tracking data within threadgroups:
[1111111111111112]
[2222222222222233]
[3333333333333444]
[4444444444445555]

Now, I may be waaaaay off here, but something like this is what makes sense to me.  Especially, since this would explain why decreasing the memory actually improves performance in some cases (by forcing synchronization).

Anyway, enough of my off-topic analysis...


I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version).


Looking forward to this !!

Just sent one coin your way, and there's another once the work is done.

Quote
We are hitting a ceiling with opencl in general (and perhaps with the current hardware).  In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something.

Out of curiosity, have you looked into trying to code a version
directly in AMD's assembly language and bypassing OpenCL entirely ?
(I'm thinking: since we're already patching the ELF output, this seems
like the logical next step Smiley)

Also, have you looked at AMD CAL ? I know this is what ufasoft's miner
uses (https://bitcointalk.org/index.php?topic=3486.500), and also what
zorinaq considers the most efficient way to access AMD hardware (somwhere
on http://blog.zorinaq.com)



Replacing one instruction in the ELF with another that uses the exact same inputs/outputs is one thing, but manually editing the ASM code is another thing entirely. Besides, with the work that has been done the GPU is already at >99% of the theoretical maximum throughput. (ALU packing) And as said above, we are also close to the theoretical minimum number of instructions to correctly run SHA256.

Also, if you look near the end of the hdminer thread you will notice that users are able to get the same hashrates from phatk on 69xx. For 58xx and other VLIW5 cards phatk is significantly faster than hdminer. If that's the best he can do with CAL then I don't see any reason to use it. hdminer had a substantial performance advantage back in March/April, but with basically every miner supporting BFI_INT this is no longer the case.

Agreed, the kernel itself is pretty optimal.  I might look into calling lower level CAL functions to manage the (OpenCL compiled) GPU threads (instead of using openCL), but I doubt this will give any speedup (although, I might be able to reduce the CPU overhead).
I understand what you are saying.  Perhaps version2.1 will be the last version that works well with VECTORS4.  You said the work that has been done on the GPU is already at >99% of the theoretical maximum throughput.  But VECTORS4 alone gives me about 1.5% boost.(contraindication?)  That is why I tried hard to find a way to make VECTORS4 work so that the future versions can use it.
Post
Topic
Board Mining software (miners)
Re: Modified Kernel for Phoenix 1.5
by
BOARBEAR
on 14/08/2011, 17:58:50 UTC
I tried to figure out the reason version 2.2 does not work well with VECTORS4
I could not find out why as I do not have enough knowledge.
Here are some results I found:

replacing this block of code in version 2.1 with the corresponding block in version 2.2 will make VECTORS4 much slower


#define P1(n) ((rot(W[(n)-2],15u)^rot(W[(n)-2],13u)^((W[(n)-2])>>10U)))
#define P2(n) ((rot(W[(n)-15],25u)^rot(W[(n)-15],14u)^((W[(n)-15])>>3U)))
#define P3(x)  W[x-7]
#define P4(x)  W[x-16]


//Partial Calcs for constant W values
#define P1C(n) ((rotate(ConstW[(n)-2],15)^rotate(ConstW[(n)-2],13)^((ConstW[(n)-2])>>10U)))
#define P2C(n) ((rotate(ConstW[(n)-15],25)^rotate(ConstW[(n)-15],14)^((ConstW[(n)-15])>>3U)))
#define P3C(x)  ConstW[x-7]
#define P4C(x)  ConstW[x-16]

//SHA round with built in W calc
#define sharoundW(n)  Vals[(3 + 128 - (n)) % 8] += t1W(n); Vals[(7 + 128 - (n)) % 8] = t1W(n) + t2(n);  

//SHA round without W calc
#define sharound(n) Vals[(3 + 128 - (n)) % 8] += t1(n); Vals[(7 + 128 - (n)) % 8] = t1(n) + t2(n);

//SHA round for constant W values
#define sharoundC(n) Barrier(n); Vals[(3 + 128 - (n)) % 8] += t1C(n); Vals[(7 + 128 - (n)) % 8] = t1C(n) + t2(n);

//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order
#define Barrier(n) t1 = t1C((n) % 64)

And this block is not the only thing that causes the problem.

I am guessing there is something to do with rotC function.(it is a guess only