My second version is slightly improved, but I still have the bottleneck of list cloning. For instance, if I have a 1GB list and I use 4 cores for the search, it uses 4GB of RAM.
Write a separate section of the function for generating the list and call it in your main function.