At least your question didn't get closed for "duplicate"

But I would change the matrix multiplication algorithm to O(n^2.
Strassen which is faster than standard O(n^3) matmul. Changes in the algorithm used generally make a much more noticeable performance impact.