I don't design ASICs, but I've tinkered with FPGAs a bit.
The optimisations listed in the paper all appear plausible. The omission of DRAM refresh is clever; I seem to recall having heard of a similar trick in the distant past, but I may be mistaken.
Domino logic is a technique for ultra fast, reduced power, reduced die-size logic design. It has a lot of pitfalls and is difficult to use and time-consuming to design. However, if your design has one particular circuit which is slower than all the others, and is the limiting factor for your clock speed (the so-called, critical path), then its benefits may be worth the effort. Salsa makes extremely heavy use of addition, so it's not surprising that the addition is a critical path.
The key thing about ASIC design is that it's one thing to design a digital circuit in an FPGA and port it to an ASIC. However, if you have enough time and enough experts, you can produce in-depth analysis of the circuit, and hand-draw the most critical parts, or use a variety of other clever tricks.
It's the difference between expert careful design and rapid turn-around design that you see between Bitfury's 55 nm ASIC and KNC's 20 nm ASIC - which have almost identical cost and performance.