"The idea behind POW consensus in Bitcoin and similar systems is that if you expend energy to create a block and that block doesn't end up in the eventual consensus chain (because you were mining off a fork or making a consensus invalid blocks-- e.g. attacking) then the energy (and the cost of that energy) is wasted."
=> think I understand this point, however with the pow i'm experimenting, miners don't produce useful work while mining. however, the pow implies that miners optimized for mining (chips etc) are also optimized for deep learning workloads. i mean, at any given time they have to choose between mining and doing useful (ai) work, the same energy is not used for both. imo, think might still be useful, bc incentivizes development of chips, memory etc for mining that can also have some other use (for ex when the chip gets replaced by newer generation and is no longer profitable for mining). so what i mean is that if miners decide to mine using that pow, the energy will still be "wasted" and so doesn't hinder consensus afaik (let me know if i'm wrong). distinction is that usage of that pow could potentially help development and dissemination of chips that can also be used for ai etc.
not related to previous point, but technically the pow is implemented using multiple of rounds of mat mults (using ternary weights for simplicity, same kind of computation used in 1.58bit llms for ex) and noise derived from nonces and security relies on the LWE problem.