Horse running through field

Sister Sarah Games the Fix

by Roger Lyons

My last post demonstrates the internal dynamics of the dam line index (DLI), with its seven-generation search limit and its survey of the last fifteen years of SWs (in the original analysis). Narrowing the survey parameter to SWs of the last 10 years confirmed the expectation that the number of SWs tracing to early-era dams would decrease substantially while their average generational distance would increase somewhat–a formula for declining DLIs of early-era dams relative to the late-era dams. Accordingly, the early-era dams moved down in rank as the late-era dams moved up, very dramatically in many cases.

But, as Greg Michalson comments, Sister Sarah (Abbots Trace-Sarita, by Swynford), an important dam born in 1930, proved non-compliant. She actually moved up in rank from eighth to sixth. The fix against her is strong statistical medicine, and, in order to game it, she would need to pick up a one- or two-generation advantage somewhere along the line over other older mares. Baby League, for example, born five years later than Sister Sarah, dropped from 14th to 18th.

Nothing proves a powerful rule like a notable exception, and Sister Sarah is–well–no exception. The numbers provide presumptive ground for suggesting that this family might be more capable than most other families of producing quality in old age. I can’t think of any other way Sister Sarah could beat the statistical fix I put her in.

Even her first generation suggests this. Her second-most substantial contributor to SWs since 2000–behind Sybil’s Sister (1943)–was her daughter, The Veil, born when Sister Sarah was 23 and the latest-born branch of the line. She’s represented within six generations (puts Sister Sarah in the seventh) by 26 SWs in the 10-year group, down from 36 in the 15-year group. The late coming of The Veil assured Sister Sarah a generational advantage over other mares of her era when the first decade of the 21st century rolled around.

Now, Sister Sarah’s daughter Lady Sybil was born in 1940 when the former was only ten years old. Even so, Lady Sybil retained a large proportion of SWs in the 10-year survey. Of the 22 that had Sister Sarah within seven generations of the 15-year group, 15 also qualified from the 10-year group. That’s the branch from which Workforce (2007) descends–his sixth dam Sister Sarah’s daughter, Lady Sybil.

Lady Sybil had a 1952 daughter named Esquire Girl, by My Babu, which produced Workforce’s fourth dam, Sounion (1961), by Vimy. At age 20, Sounion produced the 1981 filly Media Luna, by Star Appeal and the third dam of Workforce. By virtue of Sounion’s production of Media Luna in old age, Workforce and eight other winners of stakes run after 1999 descend from Sister Sarah within the seven-generation limit.

No doubt, such occurrences can be found along the branches descending from any great dam, but the numbers suggest that, more than is usual, descendents of Sister Sarah have been able to produce good producers late in their breeding careers.

Dam Lines Unplugged

by Roger Lyons

In my last post about comparing the contemporary influence of competing dam lines, I explained that the measure of their influence (Dam Line Index, or DLI) is a function of 1) the number of SWs descending in direct female descent of the given dam (S) and 2) the average generational distance of the dam from those SWs (G). Or S / G = DLI. That function yields a measure that roughly reflects the relative influence of different dams of different eras.

I also mentioned that I go back only seven generations in search of any given dam–say, La Troienne–even though some contemporary SWs trace to La Troienne further back than that. Setting a maximum generational distance is important because it limits the extent to which the DLI is biased in favor of early-era dams. Going back eight or nine generations would increase the number of SWs descending from La Troienne more than it would increase the average generational distance.

Now, what I didn’t mention in that post is that there’s another factor related to that same bias: the historical limit that is set on the population of SWs surveyed. The tables published in my last post are based on SWs of stakes run from 1995 to the present. But, what if we lopped off the earliest five years and just used SWs from 2000 to the present?

Well, fewer of the more recent SWs would be traceable to older dams within the maximum seven-generational range. That means using only the more recent SWs tends to tilt the bias in favor of the later-era dams, just the opposite of what happens if you increase the generational distance of the survey.

Okay, I realize this is kind of wonkish, but, if you compare the rank order based on stakes run since 1995, here, with the alternate rank order based on stakes run only since 2000, here, you’ll get a sense of what is at stake in how these statistics are done.

The range of the alternate DLIs is compressed, such that La Troienne’s lead is diminished. Some of the later-era dams have displaced some of the earlier-era dams in the top spots. Helene de Troie, the dam of La Troienne and ranked 6th on the original list, is now ranked 23rd. Grey Flight, nowhere to be found on the original list, is now ranked 34th. Best in Show, the latest-born of all the top dam lines, at an average of only four generations removed, has edged up from 10th to 8th.

Even more indicative is that Urban Sea, to which I referred in my last post as ranking 362nd on the original list (not in the top-40 table), ranks 154th on the alternate list.

Basically, these two issues–how many generations are surveyed and how recent the pool of SWs–bear on the question of currency, and, to the extent that currency is valued above other considerations in the assessment of dam lines, less turns out to be more.

Measuring Dam Lines

by Roger Lyons

Every few years, around November sale time, I survey female lines to see how their rankings have shifted over time. The terms “survey” and “ranking” require definition. This time round, I pulled out the winners of stakes, as compiled by WTC, that were run from 1995 to the present and tabulated every occurrence of every dam in the female lines of these SWs going back seven generations, along with the generational distance of each occurrence.

Then I crunched the numbers as usual to find out for each dam 1) how many SWs were descended from her and 2) the average generational distance of her occurrence in the female lines of those SWs. That’s all you need in order to get a rough idea of relative contemporary influence because, if you divide the number of SWs by the average generational distance at which the dam occurred in the female lines, then you end up with an index that you can use to rank the dams in a more or less valid way–based on the number of SWs per average number of generations removed. Let’s call it the Dam Line Index (DLI).

For example, La Troienne, which ranks highest among the 70,659 individual dams represented, occurs in the female line (within seven generations) of 298 SWs (since 1995) at an average of 6.42 generational removes. That means her DLI is 298 / 6.42 = 46.42, which is the number of SWs descended from her per generation.

The average generational distance is useful because it controls to some extent for the differential opportunity of mares of different eras. Best in Show, for example, which ranks 10th, occurred in the female lines of 78 SWs and at an average generational distance of only 3.79 generations. So, the index of her influence is 20.58 SWs per generation. That’s a lot less than La Troienne, but not that much less than Escutcheon, which has the second-highest rank, at 28.21 and an average generational distance of 6.38.

Now, there’s always going to be someone who says (without thinking) that seven generations is not enough since La Troienne occurs beyond the seventh generation of some contemporary SWs. That is an untenable position, though, because, if you extend that rationale across the population of dams (not just La Troienne), there can be no generational limit that will satisfy them all.

Besides, the method already has an inherent bias in favor of the older dams. Consider that Urban Sea, dam of sires Galileo and Sea the Stars, plus other high-class sons and daughters, ranks last (362nd) among dams with a DLI of at least 7.0. She is the dam of seven SWs, but has not had the advantage of subsequent generations through which to multiply her influence. She’s almost certainly bound to be better as a tail-female influence than she ranks now, and increasing the maximum generational distance of the survey would only serve to exaggerate the bias against her.

Click here to view the alphabetical list of top-40 dam lines and here to view the same list in rank order.

More about this topic in subsequent posts.

Zenyatta vs. Joe DiMaggio

by Roger Lyons

Hands down, the greatest streak in sports is Joe Dimaggio’s 56-game hitting streak in the season of 1941. I say this advisedly–advised, that is, by an article written by Stephen Jay Gould for The New York Review of Books (August 18, 1988), which I read back then and managed to track down for this occasion in that publication’s online archive. In it, Gould cites Nobel laureate in physics, Edward Mills Purcell, who conducted probability analyses of all recorded streaks in baseball to find out which ones had been likely to occur all along and which ones were utterly improbable. He found that all “fell victim to the laws of probability,” except for Joe Dimaggio’s 56-game hitting streak.

That streak is so improbable, Purcell concludes, that “to make it likely (probability greater than 50%) that a run of even 50 games will occur in the history of baseball up to (then), baseball rosters would have to include either four lifetime .400 batters or 52 lifetime .350 batters over careers of one thousand games.” Nobody has ever hit .400 lifetime, and, then as now, only three major league players have had lifetime batting averages over .350–Ty Cobb, Rogers Hornsby, and Joe Jackson. Joe Dimmagio hit only .325 lifetime.

So, there you are. The question is, how many consecutive races would Zenyatta have to win in order to run up a streak as improbable as that?

I wouldn’t know how to get the right answer, but let’s look at it this way. The nearest any player has approached Dimaggio’s 56-game hitting streak was Pete Rose’s 44-game streak in 1978, not counting the old record of 45 games set by Willie Keeler in 1896–a different era. Dimaggio’s streak was 1.27 times longer than that of Pete Rose.

But we know from Purcell’s study that a 44-game hitting streak has at least a 50% probability. Every once in awhile someone is going to achieve it even though nobody except Pete Rose has done so lately (kind of like the Triple Crown). Given past racing streaks, I think it’s fair to assume that winning 18 consecutive races–she’s won 19 already–is probable enough to be comparable to a 44-game hitting streak. If so, it would mean Zenyatta could equal Dimaggio’s streak by winning 23 consecutive races.

But it can’t be that simple. She has the very great advantage of entering the starting gate at a chosen time and place, and that depends to some extent on how she’s training. Dimaggio had to step up to the plate every day, wherever the Yankees happened to be playing. Again, I don’t know the technically correct way to do this, but it seems to me Zenyatta would have to win, say, 25–maybe 26–consecutive races in order to be in the same ballpark as Joltin’ Joe.

Greenspan to the Rescue

by Roger Lyons

A recent Blood-Horse blog post, posted on September 27, drew the usual parallel between the performance of the stock market and demand in the commercial market for thoroughbreds. The post was occasioned by a USA Today report of an Alan Greenspan speech in which he argued that, as the blog post put it, “a sustained stock rally would be the ‘most effective’ stimulus for the sluggish economy.” As the post goes on to explain, “a rising stock market makes people feel richer, it inspires confidence, and it signals optimism in the future.” The point was to suggest that what’s good for the stock market is good for commercial breeders.

Feel-good macroeconomics has its place, but I’ve never felt that good about it. Maybe it’s because, around the time I first started hearing about the trickle-down theory–while studying at the University of Kentucky and doing odd jobs in the thoroughbred industry–I got my first Form 1099 ever, and it soaked up about everything that had trickled down to me. That was within the first year or two of the Reagan administration. I imagine he had economists who thought public policy that lavishly enriched the investor class would unleash such a mighty torrent on the rest of us that they needed to build a dam to stop the deluge (for our own good!), but, instead, the trickle just dried up.

And that’s when they started talking about supply-side economics–the if-you-build-it-they-will-come theory of macroeconomics. It was a reversion to the classical doctrine that supply creates its own demand–because, if people are employed in making stuff, they will be able to buy it. Only this time round, economic policy was in thrall to Alan Greenspan’s Ayn Rand fantasy, in which the invisible hand of the market doles out to all what they deserve–if only the policy makers just stay out of the way.

To some, it might have seemed a Yogi Berra moment when Dwight D. Eisenhower said, “things are more like they are now than they ever were before,” but his fellow Kansans knew what he meant. For nearly three decades after the war, the economic universe was expanding, and it was a patently Keynesian universe. It truly was how things should have been all along: driven by demand, wages high, unemployment low, leisure not just a retirement pipedream, and the top marginal tax rate–91%. That was the universe, by the way, in which American racing had its heyday.

But, when demand is high, it’s apparently the doom of policy to assume that supply is everything. What we have now, as a result, is the widest income disparity between the rich and the rest of us since 1928. That means, perforce, that unemployed, underemployed, and working- and middle-class Americans have no money to spend; and, at the same time, huge amounts of money are being committed to the very kinds of speculative investment that expose the economy to calamity.

The thoroughbred industry’s problem is a lot bigger than just a lack of demand for racehorses and breeding stock. Yes, commercial breeders experience it as such, and it’s painful, but, as I’ve argued before, it’s a mistake to think of thoroughbred sales as even a proximate function of effective demand. Those buyers are, in fact, investors in the system of production. Through them the system distributes young horses for training and channels breeding stock into more efficient use. They are suppliers, not consumers. I emphasize this point because, the more commercial breeders dwell on the faux-demand feature of what is really a supply function, the more they lose sight of effective demand, the consumer.

The consumers are the ordinary folks who used to watch and play the races–you know, the ones who’ve been impoverished by the cockamamie economic policies of the last 30 years. The economy of the thoroughbred industry is in the same demand trough as the larger economy and for the same reasons. It’s about unemployment, declining real wages, and diminished expectations. No matter how rich the rich are, it makes no sense to invest in thoroughbred production during a time when the ordinary people who would otherwise comprise effective demand for the product–as an object of beauty, grace, and wager–are losing their homes and livelihoods. That’s why anyone who cares about racing should raise a holy clamor whenever somebody whines, “If only the rich were richer.” Alan Greenspan?!. . . . Please!

Pedigree According to Darwin

by Roger Lyons

Charles Darwin’s Origin of Species succeeded in convincing most of the naturalists of his time that species difference is generational in nature, but he offended the faith of the clergy, who thought he went too far. Darwin thought the faith of the clergy didn’t go far enough.

You see, Darwin embraced the idea of a primordial Eve who was the Mother of us all, and he found inspiration in the thought of it. Darwin’s admiration for the natural world was magnified by his discovery that it had evolved in the well-ordered form of God’s “great Tree of Life.” And, since then, science has confirmed what hitherto could only be taken on faith. In that respect, science has done religion a great, unrequited favor.

Suppose your pedigree could be traced back to its origin and your entire human ancestry could be shown extending itself on a huge video display. It would look like the familiar binary tree that is used for thoroughbred pedigrees, the distinctive dynamic of which is that the number of nodes, for male and female, doubles with each generational remove. But have you ever thought about what happens at the end of it all?

For an undetermined number of generations backward, the number of individual ancestors would increase with each generational remove, but at a certain point, while the number of nodes in the tree would continue to double as your ancestry traces backward from one generation to the next, the number of individual ancestors occupying those nodes would begin to decrease, some fewer and fewer names distributed in increasing concentrations among the multiplying nodes.

Eventually, different parts of the tree would terminate at different generational removes as the entire structure approaches the common origin; and you would know your pedigree to be complete when all of the female nodes are occupied by a single name–that of Eve. Thus, all our pedigrees arrive at the same beginning.

Darwin’s only offense against the clergy was to render as fact the most primordial–and deeply repressed–spiritual longing: for all life to be one body.

My Favorite Matches–Keesep10 Day 1

by Roger Lyons

This post consists of some observations about hip numbers 20, 41, 52, 61, 76, and 90, which were offered on Day 1. My purpose is to highlight some pedigree matches that appear, from a certain statistical perspective, to be exceptionally well made, and I do this after they’ve gone through the ring, so as to render the exercise somewhat more academic than it could otherwise be.

The approach is based on the idea that a given stallion’s “strike rate” with mares that have a given ancestor provides an indication of the stallion’s relation to that ancestor in terms of performance. For example, the stallion Dynaformer has foals out of 45 individual mares with Seattle Slew occurring anywhere in their ancestries, and seven of those mares produced superior runners (winner of an unrestricted stakes, winner of a blacktype-qualifying foreign stakes, or a runner that finishes second in a G1 or G2 race). So, 7/45 is Dynaformer’s strike rate with mares that descend in some way from Seattle Slew.

Dynaformer has high strike rates with some ancestors, such as Seattle Slew, but with other ancestors he has low or average strike rates. Imagine, then, the evaluative potential of the strike rates for all ancestors represented in the six-generation ancestries of all the dams of Dynaformer’s foals, aged three and up. Any given mare could be comprehensively assessed as a potential mate for Dynaformer, based on his strike rates with her individual ancestors.

That is, in fact, the approach used below. Based on the sire’s strike rates with the individual six-generation ancestors of the dam, she occupies a percentile rank relative to other mares that have produced foals by the stallion, but the real value of having the data is that it enables pedigree interpretation and inference that is more comprehensively grounded in pertinent facts than is otherwise possible.

Hip 20 (A.P. Indy-Byzantine, by Quiet American): Byzantine’s ancestry scores at the 94th percentile of mares that have produced foals by A.P. Indy. Out of mares by Quiet American, A.P. Indy has sired Bernardini (multiple-G1) and A. P. Warrior (multiple-G2). Only one Quiet American mare that has produced a foal by A.P. Indy through his 2007 crop was unable to come up with a major stakes winner.

The strike rate of 2/3 with Quiet American didn’t come from out of the blue. A.P. Indy has extremely good numbers with both Fappiano (6/33) and Dr. Fager (10/53), sire and broodmare sire, respectively, of Quiet American, and he’s 6/40 with Quiet American’s third dam, Cequillo. Often, how well or poorly a stallion will do with mares by a given sire is indicated by the stallion’s record with the background ancestry. When a stallion has had no opportunity with an individual broodmare sire, I’m always especially cautious when he has a poor record on either side of the broodmare sire’s ancestry. That’s not the case here.

The yearling’s second dam is by Vice Regent, with which A.P. Indy has a strike rate of 11/43. Now, that’s mostly through Deputy Minister. In fact, A.P. Indy has a strike rate of 1/11 through female strains of Vice Regent, as in this case, but that is the only soft spot in Byzantine’s ancestry. A.P. Indy has a strike rate of 4/20 with Vaguely Noble, sire of the third dam, and a strike rate of 2/10 with Amerigo, sire of the fourth dam.

Hip 41 (Unbridled’s Song-Future Guest, by Copelan): Future Guest’s ancestry scores at the 96th percentile of mares that have produced foals by Unbridled’s Song. Rockport Harbor (G2) is one of two superior runners Unbridled’s Song has from opportunity with only three mares by Copelan (he’s had two other mares whose dams are by Copelan, for a total of five mares).

When it comes to Roberto, sire of Future Guest’s dam, the case becomes more nuanced. He has a record of 2/36 overall with mares that have Roberto in their ancestries. However, he’s had only nine mares that had Roberto through female strains, and only seven with Roberto in this pedigree position. One of those seven was Fleet Lady, by Avenue of Flags, and out of Dear Mimi, by Roberto. Fleet Lady is the dam of dual-G1 winner, Midshipman, by Unbridled’s Song.

With Sailor, sire of the third dam, Unbridled’s Song is 3/17, and with Swaps he’s 10/107. When Unbridled’s Song’s weakest strike rate in the ancestry of a mare is 10%, then he’s going to have a pretty good profile.

Hip 52 (Dynaformer-Juke, by Mr. Prospector): Even if you disregard Haka, the G3 winner on the catalogue page, the profile of this yearling’s ancestry is impressive. As it is, Juke’s ancestry ranks at the 96th percentile of mares that have produced foals by Dynaformer.

With Mr. Prospector overall, Dynaformer has a strike rate of 27/209–not bad, but misleading because, with female strains of Mr. Prospector, as in this case, his record is somewhat better, at 8/56. It’s of some concern that the quantity is a bit more impressive than the quality, but, then, that is the weakest part of the dam’s ancestry, with respect to Dynaformer. With Seattle Slew, sire of the second dam, Dynaformer has a strike rate of 7/45, and his strike rate with Seattle Slew in this position of the dams’ ancestry is 3/8.

With Riva Ridge, sire of the third dam, Dynaformer has a strike rate of 4/7, and in this pedigree position the strike rate is 3/4. And, by the way, the fourth dam, Exclusive Dancer, shows up through her son General Assembly in the ancestry of another mare that produced a stakes winner by Dynaformer.

Hip 61 (Smart Strike-Lassie’s Legacy, by Deputy Minister): Lassie’s Legacy ranks at the 85th percentile of mares that have produced foals by Smart Strike. That’s not as high as other cases listed here, but, like those other cases, the dam’s ancestry is free of ancestors that have been unfavorable to Smart Strike.

With Deputy Minister, Smart Strike has an overall strike rate of 5/42, but that may be deceptive because through female strains, as in this case, the strike rate is 4/29, and when Deputy Minister is the damsire, as in this case, the strike rate is 4/25, including Curlin and multiple graded stakes winner, Tenpins. Quality matters.

With Weekend Surprise, Smart Strike is 2/16 overall and 1/1 as the second dam, as in this case.

Hip 76 (Unbridled’s Song-My Friend C. Z., by Seeking the Gold): My Friend C. Z. scores in the 98th percentile of mares that produced foals by Unbridled’s Song, partly because Unbridled’s Song has a strike rate of 2/6 with mares that have Seeking the Gold in their ancestries and a strike rate of 2/3 with Carols Folly, the third dam, including G1 winners Unbridled Elaine and Political Force. In this case, the catalogue page almost says it all, except for highlighting the very small opportunity from which such good quality was produced.

Hip 90 (Street Cry-Shopping, by Private Account): As a general rule, the younger the sire, the less definitive the statistical profiles. What that means is that, for a young stallion like Street Cry, a profile can score in the 92nd percentile, as in this case, and still have an area of uncertainty.

While Street Cry has a strike rate of 2/6 with Private Account, he remains 0/7 with Majestic Prince, sire of the second dam. But one must keep one’s eye on the ball. Ultimately, the question in view must always be to what extent the ancestry as a whole supports the dam herself. Clearly, Private Account is in Street Cry’s camp, and, when the numbers in the background of the second dam are taken into account, the conclusion must be that a strike rate of 0/7 with Majestic Prince at this stage in Street Cry’s career doesn’t matter. It just hasn’t happened yet.

After all, Street Cry is 8/78 with Majestic Prince’s sire, Raise a Native, 4/34 with Better Self, sire of the third dam, Lady Be Good (which, by the way, shows up in the pedigree of Street Cry G1 winner Cry and Catch Me), and 6/57 with Eight Thirty, sire of the fourth dam. For such a young stallion, this is a very good profile.