Thanks for the update the last .gz you had I think was from September.
Correct (August 6 and September 2).
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a
storage bucket that can be accessed by a server.[/quote]
Amazon charges $0.09 per GB outgoing data, that's rediculous for this purpose (my current 5 TB bandwidth limit would cost $450 per month when maxed out). And Amazon wants my creditcard instead of Bitcoin.
If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.
Still, that's quite excessive for just 2 files that are barely used.
Separately, sorting lists are not scalable, period.
Actually,
sort performs quite well. I've tested:
10M lines: 10 seconds (fits in RAM)
50M lines: 63 seconds (starts using temporary files)
250M lines: 381 seconds (using 2 GB RAM and temporary files)
So a 5 times larger file takes 6 times longer to sort. I'd say that's quite scalable.
It just takes a while because it uses temporare disk storage. With enough RAM, it can utilize multiple cores.
There are some things you can do to increase the speed, such as keep the list in RAM, or cutting the number of instances the entire list is reviewed, but you ultimately cannot sort an unordered very large list.
The 256 GB RAM server idea would cost a few dollars per hour, so I'll do with less.