As a FYI, you generally will not want to host files on a server. You will probably want to host files in a
storage bucket that can be accessed by a server.
Amazon charges $0.09 per GB outgoing data, that's rediculous for this purpose (my current 5 TB bandwidth limit would cost $450 per month when maxed out). And Amazon wants my creditcard instead of Bitcoin.
I had used AWS as an example because I believed you used it for some of your other projects.
Yes, transferring data to the internet is very expensive. You can use a CDN (content delivery network) to reduce costs a little bit. 5 TB of data is a lot.
Separately, sorting lists are not scalable, period.
Actually,
sort performs quite well. I've tested:
10M lines: 10 seconds (fits in RAM)
50M lines: 63 seconds (starts using temporary files)
250M lines: 381 seconds (using 2 GB RAM and temporary files)
So a 5 times larger file takes 6 times longer to sort. I'd say scalability is quite good.
I think you are proving my point. The more input you have, the more time it takes to process one additional input.
To put it another way, it takes 1 unit of time to sort a list with a length of 2, it takes 1 +
a units of time to sort a list with a length of 3, it takes 1 +
a +
b units of time to sort a list with a length of 4, and so on. The longer the list, the longer it will take to sort one additional line.
As a FYI, you generally will not want to host files on a server. You will probably want to host files in a
storage bucket that can be accessed by a server.
If you want to update a file that takes a lot of resources, you can create a VM, execute a script that updates the file, and uploads it to a S3 (on AWS) bucket. You would then be able to access that file using another VM that takes fewer resources.
That may save on local resources but you will be paying a lot of money per month if people download several hundred gigabytes each month particularly if the files are large like the files hosted in the OP.
If you have the network capacity then it's better to just serve it locally (except, AWS bills your upload traffic too

)
Your local ISP might not like it very much if you are uploading that much data.