This would divide the mining power by n (with n the numbers of sub networks), thus reduce the resilience of the network (how hard it would be for an attacker to modify the ledger) by a factor of n.
Yes. However, there is no requirement to have more sub-networks than desired. E.g. it could start out with just 5 sub-networks. All addresses in the tx would then all have to have the same (mod 5) modulo. There is not even a need to roll out all 256 sub-networks afforded by sharding the first byte.
Hence having "16777216 sub-networks" is not viable at all (and that's why Ethereum won't implement sharding before using PoS).
Yes, of course, indeed. It was just an indication as to how many sub-networks would be possible by sharding 3 bytes. I also do not believe that we would ever need to scale that far. It is rather that the principle allows for unlimited scaling.
There is obviously a need to figure out how to start sharding and with how many shards initially, and how to (automatically?) increase the number of shards when needed, but I assume that this problem does not need to be solved already. The deployment could cause a fork in the network. Therefore, allowing for the existing and the new situation to correctly co-exist will be a very real problem to solve, with a lot of nitty-gritty details to take into account. If I tried to dream up such details right now, on the spot, they would undoubtedly be grossly wrong.