With the permission of poster GP, I reproduce below [with very slight cleanups and a clarifying note and link or two], what formerly appeared as post no 16 at the UD blog thread on a recent article on the claimed defects of Intelligent Design reasoning as assessed by a statistician, by Prof P Olofsson, a professor of statistics and former commenter at UD.
I must also make honourable mention of UD commenter ES58, who has independently communicated to me a copy of the same post. [Cf previous post of my own deleted comment for his remarks. Thank you, ES58, for your public spiritedness. Your communication allows us to capture the timestamp. Also, some thanks are due to DS, the thread's owner, who kindly temporarily put the post back up, allowing GP to copy it for himself.]
It will be seen that this comment is in effect an informal review of the article, by an informed person; indeed, by a concerned Physician who has been a long-time and highly respected commenter at the Uncommon Descent Blog. (It should be noted that the writer is not a native English Speaker.)
I highly commend it -- and indeed, it supplied the lack noted by UD commenter PaV in what is now comment no 58:
"no one is addressing Prof Olafsson’s paper."PaV, someone did; with responsible diligence, thoroughgoing solidity and even passionate, but restrained understated eloquence. One, I am proud to own as friend.
_______________
[Nov 26, 2008, UD thread on Prof PO's ID critique paper at http://www.uncommondescent.com/intelligent-design/some-thanks-for-professor-olofsson/ ]
[comment] 16 [deleted]
gpuccio
11/25/2008
6:38 pm
I have read Peter Olofsson’s essay on Talk Reason titled “Probability, Statistics, Evolution, and Intelligent Design” and, while recognizing the correctness of the general tone, I am really disappointed by the incorrectness of the content. With this I do not mean, obviously, that PO does not know his statistics, but that he uses it completely out of context. And many of his errors derive essentially from not being apparently really familiar with biology.
I will try to make some comments.
PO’s arguments are essentially dedicated first to Dembski, and then to Behe. I think he fails in both cases, but for different reasons.
In the first part, he argues against Dembski’s approach to CSI and his explanatory filter.
The first, and main, critic that he does is the following: “He presents no argument as to why rejecting the uniform distribution rules out every other chance hypothesis.”
I’ll try to explain the question as simply as possible, as I see it.
Dembski, example, when applied to biological issues like the sequence of aminoacids in proteins, correctly assumes a uniform probability distribution. Obviously, such an assumption is not true in all generic statistical problems, but Dembski states explicitly, in his works, that it is warranted when we have no specific information about the structure of the search space.
This is a statistical issue [e.g. cf here, here, here and here], and I will not debate it in general.
I will only affirm that, in the specific case of the sequence of aminoacids in proteins, as it comes out from the sequence of nucleotides in the genome through the genetic code, it is the only possible assumption. We have no special reason to assume that specific sequences of aminoacids are significantly more likely than others.
There can be differences in the occurrence of single aminoacids due to the asymmetric redundant nature of the genetic code, or a different probability of occurrence of the individual mutations, but that can obviously not be related to the space of functional proteins. There is really no reason to assume that functional sequences of hundreds of aminoacids can be in any way more likely than non functional ones. This is not a statistical issue, but a biologic one.
So, PO’s critic may have some theoretical ground (or not), but it is totally irrelevant empirically.
His second critic is the following:
“As opposed to the simple Caputo example, it is now very unclear how a relevant rejection region would be formed. The biological function under consideration is motility, and one should not just consider the exact structure of the flagellum and the proteins it comprises. Rather, one must form the set of all possible proteins and combinations thereof that could have led to some motility device through mutation and natural selection, which is, to say the least, a daunting task.”
In general, he affirms that Dembski does not explicitly state how to define the rejection region.
Let’s begin with the case of a single functional protein. Here, the search space (under a perfectly warranted hypothesis of practically uniform probability distribution) is simply the number of possible sequences of that length (let’s say, for a 300 aa protein, 20^300, which is a really huge space). But which is the “rejection region”? In other words, which is the probability of the functional target? That depends on the size of the set of functional sequences. What is that size, for a definite protein length?
It depends on how we define the function.
We can define it very generically (all possible proteins of that length which are in a sense “functional”, in other words which can fold appropriately and have some kind of function in any known biological system). Or, more correctly, we can define it relatively to the system we are studying (all possible proteins of that length which will have an useful, selectable function in that system). In the second case, the target set is certainly much smaller.
It is true, however, that nobody, at present, can exactly calculate the size of the target set in any specific case. We simply don’t know enough about proteins.
So, we are left with a difficulty: to calculate the probability of our functional event, we have the denominator, the search space, which is extremely huge, but we don’t have the numerator, the target space.
Should we be discouraged?
Not too much.
It is true that we don’t know exactly the numerator, but we can have perfectly reasonable ideas about its order of magnitude. In particular we can be reasonably certain that the size of the target space will never be so big as to give a final probability which is in the boundaries, just to make an example, of Dembski’s UPB.
Not for a 300 aa protein. And a 300 aa protein is not a very long protein.
(I will not enter in details here for brevity, but here the search space is 20^300 [NB: ~ 2.037*10^390; the UPB of odds less than 1 in 10^150 as the edge of reasonable probbaility is based on the fact that there are less than 10^150 quantum states of all atoms in the observable universe from its origin to its end, so odds longer than that exhaoust its available probabilistic resources]; even if it were 10^300, we still would need a target space of at least 10^150 functional proteins to ensure a probability for the event of 1:10^150, and such a huge functional space is really inconceivable, at the light of all that we know about the restraints for protein function.)
That reasoning becomes even more absolute if we consider not one protein, but a whole functional system like the flagellum, made of many proteins of great length interacting for function. There, if it is true that we cannot calculate the exact size of the target space, proposing, as PO does, that it may be even remotely relevant to our problem is really pure imagination.
Again, I am afraid that PO has too vague a notion of real biological systems.
So, again, PO’s objections have some theoretical grounds, but are completely irrelevant empirically, when applied to the biological systems we are considering.
That is a common tactic of the darwinian field: as they cannot really counter Dembski’s arguments, they use mathematicians or statisticians to try to discredit them with technical and irrelevant objections, while ignoring the evident hole which has been revealed in their position by the same arguments. PO should be more aware that here we are discussing empirical science, and, what is more important, empirical biological science, which is in itself very different from more exact sciences, like physics, in the application of statistical procedures.
The last point against Dembski regards his arguments in favor of frequentist statistics against the Bayesian approach.
This part is totally irrelevant for us, who are not pure statisticians. Indeed, it will be enough to say that, practically in all biological and medical sciences, the statistical approach is Fisherian, and is based on the rejection of the null hypothesis.
So, Dembski is right for all practical applications.
Indeed, PO makes a rather strange affirmation: “A null hypothesis H0 is not merely rejected; it is rejected in favor of an alternative hypothesis HA”. That is simply not true, at least in biology and medicine. H0 is rejected, and HA is tentatively affirmed if there is no other causal model which can explain the data which appear not to be random. So, the rejection of H0 is done on statistical terms (improbability of the random nature of the data), but the assertion of HA is done for methodological and cognitive reasons which have nothing to do with statistics.
The second part of PO’s essay is about Behe’s TEOE, and the famous problem of malaria resistance.
Here, PO’s arguments are not only irrelevant, but definitely confused.
I’ll do some examples:
“The reason for invoking the malaria parasite is an estimate from the literature that the set of mutations necessary for choloroquine resistance has a probability of about 1 in 10^20 of occurring spontaneously.”
Yes, Behe makes that estimate from epidemiological data of the literature.
But he also points out that the likely reason for that empirical frequency is that chloroquine resistance seems to require at least two different coordinated mutations, and not only one, like resistance to other drugs. Indeed, Behe’s point is that the empirical occurrence of the two kinds of resistance is in good accord with the theoretical probability for a single functional mutation and a double coordinated functional mutation.
Again, PO seems to be blind to the biological aspects of the problem.
“Any statistician is bound to wonder how such an estimate is obtained, and, needless to say, it is very crude. Obviously, nobody has performed huge numbers of controlled binomial trials, counting the numbers of parasites and successful mutation events.”
But Behe’s evaluation is epidemiological, not experimental, and that is a perfectly valid approach in biology.
“Rather, the estimate is obtained by considering the number of times chloroquine resistance has not only occurred, but taken over local populations — an approach that obviously leads to an underestimate of unknown magnitude of the actual mutation rate, according to Nicholas Matzke’s review in Trends in Ecology & Evolution.”
Here PO seems to realize, somewhat late, that Behe’s argument is epidemiological, and so he makes a biological argument at last. Not so relevant, and from authority (Matzke, just to be original!). But yes, maybe there is some underestimation in Behe’s reasoning. Or maybe an overestimation. Thats’ the rule in epidemiological and biological hypotheses. nobody has absolute truth.
“Behe wishes to make the valid point that microbial populations are so large that even highly improbable events are likely to occur without the need for any supernatural explanations.”
No, he only makes the correct point that random events are more likely to occur in large populations than in small populations. If they are not too “highly improbable”, of course. In other words, a two aminoacid coordinated functional mutation “can” occur (and indeed occurs, although rarely) in the malaria parasite. But it is almost impossible in humans.
What has that to do with supernatural explanations? [NB: Cf my discussion here on the misleading contrast natural/supernatural vs the relevant one: natural/artificial, and the underlying materialist agenda that is too often at work, here.]
“But his fixation on such an uncertain estimate and its elevation to paradigmatic status seems like an odd practice for a scientist.”
Uncertain estimates are certainly not an odd practice for a biologist. And anyway, Behe does not elevate his estimate to “paradigmatic status”: he just tries to investigate a quantitative aspect of biological reality which darwinists have always left in the dark, conveniently for them I would say, and he does that with the available data.
“He then gores on to claim that, in the human population of the last 10 million years, where there have only been about 10^12 individuals, the odds are solidly against such an unlikely event occurring even once.”
For once, that’s correct.
“On the surface, his argument may sound convincing.”
It is convincing.
“First, he leaves the concept “complexity” undefined — a practice that is clearly anathema in any mathematical analysis.”
That’s not true. He is obviously speaking of the complexity of a functional mutation which needs at least two coordinated mutations, like chloroquine resistance. That is very clear if one reads TEOE.
“Thus, when he defines a CCC as something that has a certain “degree of complexity,” we do not know of what we are measuring the degree.”
The same misunderstanding. we are talking of mutational events which require at least two coordinated mutations to be functional, like chloroquine resistance, and which in the natural model of the malaria parasite seem to occur with an approximate empirical frequency of 1-in-10^20.
“As stated, his conclusion about humans is, of course, flat out wrong, as he claims no mutation event (as opposed to some specific mutation event) of probability 1 in 10^20 can occur in a population of 10^12 individuals (an error similar to claiming that most likely nobody will win the lottery because each individual is highly unlikely to win).”
Here confusion is complete. Behe is just saying a very simple thing: that a “functional” mutation of that type cannot be expected in a population of 10^12 individuals. PO, like many, equivocates on the concept of CSI ([with bio-] functional specification [being particularly in view]) and brings out, for the nth time, the infamous “deck of cards” or “lottery” argument (improbable things do happen; OK, thank you, we know that).
“Obviously, Behe intends to consider mutations that are not just very rare, but also useful,”
Well, maybe PO understands the concept of CSI, after all. [NB: cf. "useful" and "[bio-] functional."] But then why does he speak of “error” in the previous sentence?
“Note that Behe now claims CCC is a probability; whereas, it was previously defined as a mutation cluster”
That’s just being fastidious. OK, Behe meant the probability of that cluster…
“A problem Behe faces is that “rarity” can be defined and ordered in terms of probabilities; whereas, he suggests no separate definition of “effectiveness.” For an interesting example, also covered by Behe, consider another malaria drug, atovaquone, to which the parasite has developed resistance. The estimated probability is here about 1 in 10^12, thus a much easier task that chloroquine resistance. Should we then conclude atovaquone resistance is a 100 million times worse, less useful, and less effective than chloroquine resistance? According to Behe’s logic, we should.”
Now I cannot even find a logic here. What does that mean? Atovaquone resistance has an empirically estimated probability of 1 in 10^12, which is in accord with the fact that it depends on a single aminoacid mutation. What has that to do with “usefulness”, “effectiveness”, and all the rest?
“But, if a CCC is an observed relative frequency, how could there possibly have been one in the human population? As soon as a mutation has been observed, regardless of how useful it is to us, it gets an observed relative frequency of at least 1 in 1012 and is thus very far from acquiring the magic CCC status.”
Here, Po goes mystical. CCC is an observed relative frequency in the malaria parasite. That’s why we cannot reasonably “expect” that kind of mutation an empirical cause of functional variation in humans. What is difficult in that? Obviously, we are assuming that the causes of random mutations are similar in the malaria parasite and in humans. Unless PO want to suggest that humans are routinely exposed to hypermutation.
“Think about it. Not even a Neanderthal mutated into a rocket scientist would be good enough; the poor sod would still decisively lose out to the malaria bug and its CCC, as would almost any mutation in almost any population.”
I have thought about it, and still can find no meaning in such an affirmation. The point here is not a sporting competition between the malaria parasite and the human race. We are only testing scientific hypotheses.
“If one of n individuals experiences a mutation, the estimated mutation probability is 1/n. regardless of how small this number is, the mutation is easily attributed to chance because there are n individuals to try. Any argument for design based on estimated mutation probabilities must therefore be purely speculative.”
That’s just the final summary of a long paragraph which seems to make no sense. PO seems to miss the point here. we have two different theories which try to explain the same data (biological information). The first one (darwinian evolution) relies heavily on random events as causal factors. Therefore, its model must be consistent with statisticalm laws, both theoretically and empirically.
Behe has clearly shown that that is not the case.
His observations about true darwinian events (microevolution due to drug pressure) in the malaria parasite, both theoretical (number of required coordinated functional mutations and calculation of the relative probabilities) and empirical (frequency of occurrence of those mutations in epidemiological data) are in accord with a reasonable statistical model.
The same model, applied to humans, cannot explain the important novel functional information that humans exhibit vs their assumed precursors.
Therefore, that functional information cannot be explained by the same model which explains drug resistance in the malaria parasite.
Does that seem clear?
It is.
In the end, I will borrow PO’s final phrase:
“Careful evaluation of these arguments, however, reveals their inadequacies.”
___________
I trust that this informal review is as helpful to the reader as it has been to me. END