I was looking to build something similar but remembered you had already laid the groundwork. I built a version for Windows, not WSL, but exe for Windows machines using Clang.
No DB name given - compute only mode
Points/launch: 4092
Range overhead: 2960
Required range: 1000000000
Adjusted range: 1000002960
Base Key: 0000000000000000000000000000000000000000000000000000000000000001
Last Key: 000000000000000000000000000000000000000000000000000000003b9ad590
Batch add: using 255 KB [T: 1 L: 244380 x 4] Memory/thread: 211 kB
Computing ~ 1000002960 points...
[99.4%] [63 s] BatchAdd speed: 15771075 keys/s [15.8 Mk/ts]
Overall gen & store speed: 15771132.227 keys/s
Total clock time: 63.408
Total wall time: 63.407
I know it's a little slower than your tests but I am wondering without changing anything, if this is the best speed I will get on a Windows machine (for my specific CPU, not all of them)...
Have you tried to compile for Windows exe? If so, did your speed dip versus Linux build?