Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em
use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error?
Dia
edit: BTW, I always thought your numbers were a couple lower than mine because you defined OUTPUT_MASK as something like "0x10" or something... doing that makes all my numbers match the ones on your thread
lol.... mostly trial and error, Initially, for version 1.1, I looked at filling the gaps in the VLIW assembly (see which VLIW5 only had 4 instructions using barrier(0) instructions to see where in the assembly the OpenCL code is), but that took a LONG time, and I think I am done with that... (it turned out it only gave me like 3 ALU ops anyway).
Seems you are wrong (at least for now):
read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time.
at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4
Yeah, I said that KernelAnalyzer 1.9 was out today saying that it supports 2.5, but 2.5 isn't out yet... probably tomorrow.
And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link