I'm not able to find an implementation for batch inversion on your code, do you use batch inversion to speed up the computation of the inverses?
The only place I've been able to find anyone publicly using Inverse Wild Herd is in this Python script here:
https://github.com/mikorist/Kangaroo-256-bit-python/blob/main/kangaroo.py Everyone else seems to be hiding how it works.

Maybe you are not watching carefully, I posted a Kangaroo python script that used the inverse herd since last July or August, and explained all the equations of why it's working, and why it brings down Kangaroo to 1.0 * sqrt(N) when DP = 0

But TBH this is already known for many years, just maybe not many people bothered to apply it, since it's at most something like a 20% speed-up over the usual 2 kang method.