Yes, testing hypothesis and theories is done by making observations. How else would you test a theory?
Okay, but at what level of scientific rigor? We observe constantly, but those observations don't necessarily lend themselves to the development of strong, scientific theories. This is especially true given that theories (which are ultimately mathematical constructs) often face the problem of "undecidabity," or the inability to determine whether one plausible interpretation of a set of data is more true than some other plausible interpretation. As far as I'm aware, this just sounds like 'naturalistic observation' which has never been synonymous with 'experiment.'
Edit: A theory can be complete, incomplete, right, or dead wrong. I think we can agree that we're both interested in 'good' theories which are testable, replicable, supported by multiple experiments and data sets, etc. Although I think modern evolutionary theory is supported by a large data set, I'm still having difficulty understanding how it is testable and replicable.
Edit 2: I've posited many times on this forum that, given the available evidence, there are alternative theories that are at least equally plausible (e.g. The evidence supports a theory that evolution in conscious states lead to evolved physical states rather than vice versa). How would you propose we test the theory of evolution against these other plausible theories?
http://en.wikipedia.org/wiki/E._coli_long-term_evolution_experiment, which I mentioned earlier, is an example of a way to experiment with evolution.
Every time a new fossil is described, that adds to the data set. If a fossil is found which does not fit the evolutionary theory, then the theory must be updated to account for it.
RE Edit 2: I am not sure I understand what the heck you are talking about? What do you mean conscious states vs. physical states?
Let's assume for a moment that certain evidence that indicates the possibility of a misinterpretation of the fossil record (e.g. proven-though-minimal changes in rates of radioactive isotope decay over time, the idea that "deeper" doesn't always equate to "older" when examining unearthed fossils, etc.) wouldn't have any substantial impact on current modern evolutionary theory if known absolutely. Let's just assume that we have a near-perfect perception and understanding of the evidence we've collected that supports the theory.
The difficulty I have grasping how evolution is wholly testable is because there are two kinds of fallibility, and only one kind is commonly referenced in science, i.e. if you find evidence to disprove the theory, then it's either a bad theory or needs improving. The other kind of fallibility is philosophical in nature -- given two seemingly equally-plausible theories, and given evidence that appears to equally support both, how can you test one theory against the other?
A common-but-flawed argument against evolution that's been put forth is the idea that evolution isn't fallible because you can always fit some piece of evidence to fit the current theory. As you pointed out, this isn't true because some piece of fossil evidence that deviates from the theory suggests the theory itself is flawed as it isn't comprehensive enough to include the new evidence. But, I'm struggling with the alternative type of fallibility. What if you have two theories that are equally-supported by the evidence? How do you determine that one is fallible against the other?
Usually, this type of fallibility isn't a concern. After all, if you find evidence to disprove evolution, then you know that theory needs to be replaced by a better one. But what about a case in which all data that has been found,
and all evidence that ever could be found, supports two theories equally?
For example, let's say two people are getting married and you are trying to develop a theory as to why they got married. A behaviorist psychologist might say that they're getting married due to a series of stimuli and responses, a neuroscientist might say they are getting married due to a complex series of electrical signals that facilitate the release of neurochemicals that provide the couple with feelings of love and attachment, and the couple themselves might just say they're getting married because they love each other and they want to. After examining all the evidence at hand, you will likely find that the evidence fully supports each of these theories. This relates back to the problem of mathematical undecidability of theories -- which is the best one?
The evidence supporting evolution equally supports at least one alternative theory. Modern evolutionary theory describes a mechanism for adaptation through common descent by way of vertical and lateral gene transfer. However, the evidence equally supports a theory in which the mechanism for adaptation isn't vertical and lateral gene transfer, but rather evolution in states of consciousness which are evidenced by vertical and lateral gene transfer and the resulting changes in genotype and phenotype. This theory posits that we did not descend from LUCA, the last universal common ancestor, but rather LUCCA, the last universal common conscious agent.
If two theories are both supported by the evidence, then you have to find somewhere that their predictions disagree and probe that area.
In your example of two people getting married, those three explanations are just three ways of saying the same thing, they do not disagree with each other.
Let me see if I understand this correctly, you are trying to decide between (change in genotype yields adaptations) vs (desire for adaptation yields change in genotype)? Are you suggesting bacteria and plants are conscious agents? I know there is much anthropomorphization going on in schools when evolution is taught, EG the giraffes wanted to reach the higher leafs of trees so they grew longer necks, but that is just analogical hand-waving to help people grasp a complicated subject.