More about the 2005 yearling study
After reading email responses to my study of Triple Plus yearlings in the 2005 Keeneland September Sale, which I reported in my last blog entry and which has been quoted in WTC ads, I knew that some clarification of the meaning of that study was called for. Sarah Wells writes:
“As a former mathematics teacher, I must question the validity of using the entity that created or significantly contributed to a particular statistic (rating) to verify that same statistic (rating).”
Quite right, Sarah. In that study I used current figures to identify the 281 Triple Plus yearlings in the sale and then summarized their subsequent performance. The performance of some of those 2005 yearlings did, in fact, contribute to those Triple Plus ratings, and their contributions had mostly favorable effects on the percentages of winners, stakes winners, graded stakes winners, etc. found in that group.
This means you should not expect the Triple Plus yearlings in the 2009 Keeneland September Sale to do as well as the Triple Plus yearlings in the 2005 sale. That is not going to happen, precisely because, as Sarah’s email suggests, those 2009 yearlings have not had a chance to contribute to the current Triple Plus ratings.
What Sarah says in her email pertains only to studies that are predictive in purpose. Put another way, what she says is that it’s nonsense to suppose that you can predict something that’s already happened. It’s like past-post betting. However, no studies that WTC has presented, no ads that it has published have ever represented its nick rating as a predictive instrument. What WTC has claimed and what I have argued on WTC’s behalf is not that its nick rating system is effective at predicting future performance, but rather that its system is effective at reflecting past performance. That is implicit in my observation that current Triple Plus ratings “captured” ten of the G1 winners that came out of that sale.
If I had been trying to find out how effectively the Triple Plus predicts performance, then the study I designed would be deeply flawed, but that is not what I was trying to do. I was trying to find out how effectively it identifies crosses that have worked in the past, including the subsequent performance of the yearlings in that sale.
That is not to say that Sarah’s criticism is misguided. Not at all. She would rather have a measure that predicts the future than have one that reflects the past, and so would I. It’s just that nick ratings don’t do that, at least not in the purely statistical sense Sarah has in mind. It’s not their job.
I would stipulate that the form of the study I designed might have contributed to the misunderstanding. The same design might be used to measure predictive power, except that the Triple Plus yearlings would have been identified four years prior to the survey of their subsequent performance. That would seem a fine thing, but it is not what a nick rating is supposed to do or can do.
Just for Sarah, I recently rolled the WTC system back to its 2005 form and applied the Triple Plus technology, but not to see how well it could predict outcomes. I already know a nick rating system can’t do that. The only purpose that exercise can have is to see how well the 2005 ratings reflect performance on the basis of an incomplete record. What I found is that the 2005 Triple Plus rating captures runners and winners just about as effectively as the 2009 Triple Plus rating does. It does 50-60% as well at reflecting stakes performance at all levels, but that is hardly surprising. The 2009 system does better than the 2005 system only because it is reflecting a more complete record. What neither system reflects can in any way be regarded as prediction.
Now, when I say that a nick rating system doesn’t predict the future, I only mean that it doesn’t do so in a systematic way. Breeders and buyers use nick ratings because they expect the future to repeat the past, and so it does. Why is it that human beings, both as societies and as individuals, will declare with absolute assurance, “I’ll never make that mistake again,” and then continue to repeat the same mistake over and over? It’s not just that we are inattentive to the past, as the maxim has it, but also that the new situations unfolding before us don’t look the same as they did before. There is a maxim for this, too: “You cannot step into the same river twice.” The problem is that, by the time thoroughbred performance gets round to repeating itself, circumstances have changed.
Consider the following cases:
In 2005, the Tiznow-Storm Cat cross would have been a Triple Plus if we’d had it back then. In the 2005 Keeneland September sale there were five yearlings bred from that cross, including G2 winners Informed and Tiz Wonderful, along with two other winners. If you’d bought all of those Triple Plus Tiznows, you’d look like a genius. In 2009, it’s an even stronger Triple Plus, and it’s probably going to be a Triple Plus for a long time to come. That seems a great thing to know.
The only problem is that the 2008 Keeneland September sale offered 20 yearlings bred from that cross. Buyers at that sale stepped into an entirely different river. There is no way that 40% of those 20 Triple Plus yearlings are going to be G2 winners. In that sense, the Triple Plus rating is far from predictive, but, if you had to draw one of the 60 Tiznows in that sale out of a hat, you couldn’t help hoping that it would be one of the 20 that were produced by Storm Cat-line mares.
In 2005 the A.P. Indy-Mr. Prospector cross would have been rated Triple Plus, based on 32 stakes winners bred from the cross, including eight G1 winners and six G2 winners. It’s important to note, though, that 25 (78%) of those stakes winners were by A.P. Indy himself. By contrast, of the 31 Triple Plus 2005 yearlings representing that cross, only 7 (23%) were by A.P. Indy himself, the rest sired by various sons. The difference between the breeding of those Triple Plus yearlings and the breeding of the stakes winners on which the Triple Plus rating was based approaches the level of an apples-and-oranges sort of difference.
Statistical prediction simply cannot be done reliably on that basis, and, again, I’m referring to the kind of prediction Sarah has in mind: prediction by statistical inference. It happens, in fact, that the A.P. Indy-Mr. Prospector cross is not rated Triple Plus in 2009, in some significant measure because of the subsequent performance of those 2005 yearlings as a group. A nick rating system just can’t know which are apples and which are oranges until the fruit ripens.
As real as nicks are, they don’t go in all generational directions. A.P. Indy line has established solid Triple Plus ratings with certain branches of Mr. Prospector line, based in part on the performance of sons–and I mean certain sons–of A.P. Indy, not all. Nick ratings are always necessarily behind the curve of the population, so much so that they do not make reliable predictions.
So, all I wanted was to find out how well the new Triple Plus rating captures the best of past performance, and what I discovered is astonishing. It’s one thing to say that a vastly higher proportion of A nicks are found among stakes winners than among pedigrees in the general population. That’s been the standard claim for a nick rating up to now, but that 2005 yearling study sets a different standard. No nick rating that I know of has ever been up to that rigorous a test of its ability to associate sire-line crosses with a history of very high performance.
Prior to the development of the Triple Plus, no nick rating could have identified a Keeneland September sample that contained 18.8% stakes winners, much less a sample of 281 that contained ten G1 winners. When it comes to profiling high-performance crosses, the Triple Plus plays a game for which no other publicly available pedigree rating could even take the field.
That’s what I meant to convey. As for the future, well, that’s anybody’s guess, but for my money, I’ll take what’s worked in the past, based on the best measure of it that’s ever been taken.