Post
Topic
Board Project Development
Merits 4 from 1 user
Re: List of all Bitcoin addresses ever used
by
NotATether
on 20/08/2020, 12:40:56 UTC
⭐ Merited by LoyceV (4)
Quote
-S will tell your machine to use at most 65% CPU
I think you mean RAM, not CPU. This VM has only 256 MB, so I'll let "sort" figure it out on it's own.

That is correct, the argument to -S is the amount of memory for sort(1) to use for its main buffer (manpage source). With a percentage it should calculate the amount of memory to reserve. But I think even a 256MB buffer is too small for the size of the dataset you're sorting, it will hit the disk too much.

Quote
-T puts temporary files in a directory (here named "tmp") and not in RAM; if you have an SSD, the speed isn't too shabby
That's default behaviour Smiley It doesn't have an SSD though, and I'm using "cputool" to keep server load low. I'm okay without daily updates on this, I wouldn't want users to download this large file on a daily basis anyway.

Quote
I have sorted huge lists (>80 GB) on budget laptops using these two arguments. Worth a shot! If you want better hosting, PM me.
Since last year, I'm using an AWS server donated by suchmoon for loyce.club. However, since AWS charges $0.15/GB, I'm not comfortable hosting very large files on suchmoon's server.
When I tested sorting data on AWS, it started throtting disk IO after a while, which made it very slow. I've also tested a pay-by-the-hour-VPS, and obviously it was a lot faster.

That's strange because all AWS servers have an SSD configured as the boot disk. If you are sorting in a VM, then all that sorting is done in a virtual hard disk, so not only are you moving memory into temporary host SSD space, it's being moved inside a virtual disk file inside said SSD and that puts extra strain on your hypervisor's emulated disk controller.

So, it's emulating all the disk controller calls that read and write data from the disk, updates disk cache and its other jobs while sort(1) moves data between its memory buffer in RAM and the hard disk (which is actually a file on your host). And it's doing that for the entire 31GB of addresses, and the algorithm sort uses needs an O(n log(n)) space, which I calculate to be 310GB for your data. All this while running emulated disk writes and reads. On top of that there is the hardware-accelerated reads and writes that the host does for the VM to it's disk file. That explains the poor performance while sorting.

You'll have better disk performance if you sort outside of a VM.