That is very arguable.
You can have unit tests, speed execution tests and all kind of tests to see if a plugin is good or not.
Yes, you can test plugins for everything you want. That is easy, as long as the "what you want part" is well defined. Finding out what exactly you want these plugins to do is very difficult. OP says the program "pays humans to do things that it can't do". In order to ask humans to do such things, it must be able to get a grasp of what it cannot do, which is actually harder than doing these things in the first place. Here's a book that offers some good insights on this matter
http://en.wikipedia.org/wiki/I_Am_a_Strange_LoopEvolutionary algorithms are just one kind of learning algorithms. There are many others, such as neural networks, swarm techniques, etc. They mostly fall short for really hard problems because they cannot handle sufficient complexity yet.