I realized that you output tuples of addresses using list(map). I suggest to adjust your program so it outputs a single address on each line so we compares apples by apples. I am not sure if the list output affects performance in some way, that's why I am pointing to it.
The example I originally showed using iceland2k14/libsecp256k1 was about half the speed. With it I can generate 1 million addresses using all cores (in my case 16 cores) and write to the file in 4.8 seconds, this is a rate of about 208,300 keys/sec. I have modified my initial program so that you can now also configure the cores under which the program is executed. So one can select specifically "1" as value, so that we also compare apples with apples.
To generate 1 address for line you need to replace
print (*addresses,sep="\n")
print (*addresses2,sep="\n")
print (*addresses3,sep="\n")
with
for i in addresses:
print (*i, sep="\n")
for i in addresses2:
print (*i, sep="\n")
for i in addresses3:
print (*i, sep="\n")
The results are the same (slightly faster):
time python3 gen_batches.py > addresses.out
real 3m5,736s
user 3m5,268s
sys 0m0,404s
wc addresses.out
16400196 16400196 573355137
less addresses.out
1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw
1K9oVg45UaApB1SmkxKFBZPCKsj2XiKCYw
13hXNkmedXJ5UXt4RY5rLGAZGBgeSHUNT7
17G6zU4yVBM8WJ7TvJvkVkZr8qtJheAYVy
1LZtUG1S8V7yfwthDfJBy1Yg7LWxpZKmQX
1PDBxaiy3bp8woFbrErNAPoMF8W442B1Nb
1CAapBviMeMp91UjqgTaUfr1tGc5Lsvhgx
1Lmrc69RFRbwGUc3UTRGxumAApEDsJKUrf
1JNGqSWZdJFptRrLaCtmUJ6Hq1necKuTCd
18aQxp32qDDHEFWgVjexS1BG9MdFbQAc6N
....