You only need to store 0.5 sqrt(N) keys (the baby steps) : from 1*G to 0.5*sqrt(N)*G
And next step is to calculate the number 0.5*sqrt(2^134)
After this calculation you will lose all your interest in BSGS

Or may be you are going to break a lot of <64bit points? Even so, kang is faster because you can have precomputed db of DPs and reduce time in 10 times or even more.