Possible if you have unlimited tons of time to be wasted, would you want to edit 500K HTML pages manually?
write/edit some python or C++ scripts

1-concat all the 500k HTML pages into one file (or 50k pages each time)
2-use some condition delete/keep all lines with/contain
3-sort and done
i did try this
