CooperToons HomePage Caricatures Alphabetical Index Return to Joe Dimaggio Caricature

Joe Dimaggio - Joltin' Joe or Lucky Joe?

Joe Dimaggio

Joe Dimaggio
Joltin' Joe or Lucky Joe

In 1898, Giuseppe Paolo Dimaggio was a fisherman who had just emigrated from his native Sicily to the United States. His wife, Rosalia followed later, and the couple settled in Martinez, California. Giuseppe continued to ply his trade in America and began selling his catch at San Francisco's Fisherman's Wharf when it really was a fisherman's wharf. Giuseppe and Rosalia had eight children, five boys and three daughters. The seventh child (and #4 son) was christened Giuseppe Paolo after his dad. But all his friends just called him Joe.

Naturally, Giuseppe wanted his five sons to follow in his footsteps and was worried when none of them had any interest in carrying on the family business. In fact, the boys were lazy lagabouts! Santa maria purisima! They were acting like Americans! All they wanted to do was hang out with their friends. Why, Vincent - the number #3 son (and #6 child) - wanted to do nothing but play baseball! Acting against his father's wishes, Vince even left home to try out for the local minor league team. Well, so the boy made the team? Hunh! Good riddance to bad rubbish!

After work, Giuseppe would sit at home ruminating about his wayward sons, often exploding in voluble Sicilian. Why young Joe - his namesake - had dropped out of school, and still he didn't want to be a fisherman. He just loafed about. The boy couldn't even hold a steady job selling newspapers! But Mama Dimaggio just told her husband to leave Joe alone. "He's going to be all right," she said.

In 1931, as the seventeen year old Joe was walking down the street, he got a hail from a chance acquaintance named Frank Venezi. Frank said that he and a friend, Bat Malafio, were going to start a ball team. Did Joe want to join? Joe though a moment and said, sure, why not? He walked on.

In early twentieth century America, baseball was everywhere. Kids played in empty lots, companies organized teams after work, and the local neighborhood gyms - the ubiquitous "athletic clubs" - and sometimes just groups of friends fielded teams. What began as casual friendly games soon became organized into local leagues. The teams were so numerous that the games had to be coordinated. That job usually fell to some local city official or well-placed businessmen who was a baseball fanatic. Every Sunday there would be three or four games in succession and scheduled in each of a city's parks. Many teams played during the week as well. This was the day when America's #1 pastime really was America's #1 pastime.

Teams ranged from the horrible to the terrific. But a mark of a good team was they could drum up a sponsor. That is, they would find some businessman who agreed to would pay for uniforms, bats, and balls, and all he wanted in return was his name on the team jerseys (that and having a team worth betting on). Frank's team was soon picked up by Rossi's Grocery (which specialized in olive oil), and soon Rossi's became one one of the top amateur teams in San Francisco. They were so good they were invited to play the prison team at San Quentin where the warden called the game at a 4-4 tie in the fourteenth inning rather than risk the prison team loosing and having a riot.

Now a really good local team might even get to play in a bout against a near-by minor league squad or even catch a major league team during the off-season barnstorming exhibition games. The matches became so competitive that the sponsor might deliberately hire a particularly good player to work for him and even pay him a few bucks extra for the game. Admissions to the games were almost always free but the team could always pass the hat around, and so the players might get a small and variable per-game fee. When players got some money from the games but were not full time ball players, they and the teams became "semi-professionals".

Some semi-professional teams would only pay their best players and might even hire a player on a game-by-game basis. A really good player could get as much as ten dollars a game. Now Joe got to be so good as many as five or six teams would hire him to play in a single season. By 1932, Joe had moved up to the A-League team sponsored by the big fruit company, Sunset Produce. He was also batting .632.

Of course, the next step up from semi-professional was pro. Professional baseball, like today, was organized into major and minor leagues although the minor leagues were more independent of the major teams than today. Scouts for the professionals - major and minor - would watch the local games for particularly good talent (at that time college teams were not a common source for professional players). With Joe's batting and fielding (he was a decent shortstop) it was inevitable that some minor league team would ask him try out. But it also helped that Vince, Joe's older brother, was still in the minors, playing for the San Francisco Seals. Joe tried out for the Seals, and they offered him $225 a month. Joe, though, did not sign on the dotted line. Guiseppie, his dad, did. Joe was still underage and could not legally sign a contract.

Joe played for the Seals for three years, and as we know the Yankees picked him up. By that time Joe was so good that the Yanks were going to pay the $75,000 for him. Big bucks in the day. Of course, it was the Seals that would get the money, although Joe, as was the custom of the time, might expect a 10% cut.

However, one night Joe was getting into a car - the circumstances were a bit less clear than related in his autobiography Lucky to Be a Yankee - and he fell and tore a ligament in his knee. Although his recovery was slow, the Yankees still paid the Seals for Joe - but only, $25,000. Joe got nothing extra than his negotiated salary.

When Joe joined the Yankees in 1936, Babe Ruth had just departed for Boston. With the Babe gone, Lou Gehrig remained as the undisputed Yankee powerhouse. But Joe soon began to eclipse the Iron Horse, at least in newspaper coverage if not in actual playing statistics. However, within two years, Lou's playing rapidly began to decline due to what has long been believed to be amyotrophic lateral sclerosis, or ALS, and now almost always called Lou Gehrig's disease. Soon Joltin' Joe was the #1 player not only at the Yankees but in all of major league baseball.

As iconic as Joe is now, at the time he found the fans a fickle lot. Once Joe held out on a contract and didn't show up for training camp. He wanted $45,000 and the Yankees offered $25,000. Hell, they said, Lou Gehrig was had only been paid $35,000! The hold out was not popular with the fans. Part of the problem was America had switched to a wartime economy, and other workers were tightening their belts. Finally Joe accepted the $25 grand, had to get in shape on his own time, and was not paid for the days he missed. But the fans still booed.

By the late 1940's Joe began to go through times of poor hitting and was making mistakes in his fielding. He would usually come back and play a series of good games, but his consistency was definitely off. Finally the Yankee's manager, the "Old Perfesser" Casey Stengel, put Joe further down the batting line (Joe almost always had batted clean-up), and finally, in 1950, Joe was benched. Joe was not pleased, and as Casey himself might have put it, it was less the matter of the benching than the manner that bothered Joe. He found out he was out of the lineup by reading the papers. What was the last straw was when Joe went out to centerfield, and Casey sent out a replacement in mid-inning. Joe refused to come in until the inning was over, and some teammates doubt he ever spoke to Casey again.

Then in 1951 a kid from Oklahoma showed up after a brief stint with the Yankees farm club. He was a more powerful hitter than Joe and had incredible speed. Soon Mickey Mantle was getting as much copy as Joe had back in 1936. Joe retired after that year, and Mickey took his place in center field. But fans missed Joe, and in 1955 he was elected to the Hall of Fame. Polls are always subjective and changeable, but one poll put Joe as the #6 player of all time.

But everyone knows that Joe's biggest claim to fame was he had the longest unbroken hitting streak in baseball. From May 15 to July 16, 1941, Joe had at least one hit in each of the 56 games. This is a record that has never been surpassed, and no one has even come close. So Joe's fans have always basked in the knowledge that there was never a player like Joltin' Joe.

Then in 2008 two computer scientists at Cornell, Samuel Arbesman and Steven H. Strogatz, ran what is known as a Monte Carlo simulation on the entire history of hitting in major league baseball. Monte Carlo calculations were developed by a number of mathematicians including Stanislaw Ulam and (usually) employ a computer to solve problems which are too complex for an "analytical" solution. Using the batting history of the individual players, Samuel and Steven had the computer simulate each player stepping up to bat. Once they finished letting everyone bat over the history of baseball, they started all over again, repeating the history. Because of the strong element of chance in swinging at a baseball, any individual time for a player at the plate differed from that of the previous history even though the average performance of a batter was the same.

After reliving the history of baseball 10,000 times, the study closed with a conclusion that makes Dimaggio fans despair. Joe's hitting streak was nothing more than what you would expect with so many major league batters stepping up to the plate over a century and a half. In fact, there is an even chance that someone would have hit a longer streak than Joe's. Joltin' Joe, it seems, was simply Lucky Joe.

Babe Ruth

Babe Ruth
He left for Boston.

Naturally Joe's fans struck back with a major league argument that Joe's 56 game streak was not just a (hitting) streak of luck. The more statistically minded of Joe's fans say his streak is a true outlier. That is, Joe's performance was so far above the rest of the batters that it cannot be ascribed to chance based upon the distribution of the data. But understand their argument you must step up to bat with a smattering of statistical knowledge.

To prove a point is an outlier you cannot just show it lies far removed from the rest of the data. Data distributes itself and some points lying far away from the average are expected. So you need to look not far a data point differs from the average of the data - but from the next closest point. This is the basis for (but is not exactly what we are doing) the famous "Q-Test" for outliers.

So when considering if Joe's batting streak was an outlier, we look at how the players with the highest winning streaks for a year compare with their nearest competitor. That is, if in 1911, Ty Cobb had a hitting streak of 40 games, what was the next number down so to speak? Well, we find that in 1987, Paul Molitor had a season with 39 consecutive games with a hit. Ty's "nearest neighbor" value is therefore 40 - 39 = 1. Of course you have some players tie, but still the statistic we assign to a player is the next number down to his own.

To see how far Joe "lies out" from the others we take the record of hitting streaks for a year and make a graph, not the length of his streak, but his "nearest competitor number". As we said Ty Cobb was one hit ahead of his nearest competitor is 1. To see the plot for all players holding the yearly high hitting streak, click on the image which will enlarge in a new window.

Baseball's "distance from previous competitor" graph is very revealing. Virtually everyone was only one game above their next competitor and only a few were two games in front. But Joe was twelve - count' em - twelve games - ahead of his nearest competitor. Joe's runner up was Willie Keeler a player from the late nineteenth and early twentieth century. Could Joe's twelve run lead be chance?

Hitting Streak Distance

"Distance from Competitor" Plot
(Click to Expand in New Window.)

At this point Joe's fans step up to the plate with the z-test. The z-test is a common statistical method to see what is the probability that an occurrence was real - that is due to some actual cause - or simply due to chance fluctuations in the data. To do the z-test, you must calculate the z-score for the point of interest. As Captain Mephisto might have said, to do that is very simple, really. What you do is to take the difference of the average for all the numbers in the graph from Joe's. The average (including Joe's number of 12) is 1.26. So the z-difference for Joe is 12 - 1.26 = 10.74. Then you divide the difference by the standard deviation of all the numbers which is 1.52. The number - the z-score - is 10.74 / 1.52 = 7.06. This number, then is a "statistic' on how far Joe is from the rest of the herd.

But just how far is Joe from everyone else? Well, the mathematicians tell us a z score of 7.06 occurs approximately once out of every trillion times. This is rare enough that Joe's hitting streak was not something expected by luck alone. Joe must have had something else that made him stand out. Case closed, say Joe's fans.

But - as we may guess - there is a "but" - and it's a big but.

Joe Dimaggio Nearest Neighbors Hitting Streak

Baseball's Nearest Neighbor z-Test
(Click to Expand in New Window.)

The "but" is that the success - or rather the validity - of the z-test is highly dependent on whether the data comes from the famous bell-shaped "normal" or Gaussian distribution. If the data is not normally distributed you might think you've found an outlier when what you've really done is simply located a perfectly typical data point from a "non-normal" distribution. Unfortunately, non-normal distributions are far more common than many people think.

What we decided to look at is the record hitting streak for the year. That is we are looking at what is called a rare event phenomenon. Now rare events do not in general occur with a normal distribution. Instead the distribution tails and is not symmetric. That is it stretches out on one side. So not only does the z-test using a normal distribution not apply, it will make extreme value appear anomalous when they are really simply what you expect. An additional complication is that we are looking at discrete numbers - that is events that occur only with whole numbers. Again the z-test is most applicable to a continuous distribution. So Joe's detractors say, all that work with the z-test brouhaha is all for naught. Sad to say, they are correct.

That, though, is why you run Monte Carlo calculations where you determine the distribution of the data and odd-ball distributions are found as a matter of course. Specifically you can use a Monte Carlo calculation to determine how often in the history of baseball we can expect the record holder of hitting streaks to be ahead by at least twelve games from his nearest competitor.

Which is what the the two Cornell scientists did. In their 10,000 histories of baseball they found that a gap of 12 games between a record holder and the nearest competitor was quite common. In 20 % of the "histories" the record setter surpassed his nearest competitor by 12 games or more . For Joe to be something not due to chance - i. e., "statistically significant" - the 12 game gap would have to happen less than 5 % of the time. Yes, it requires a bit of luck, but something with 20 % frequency can happen quite easily.

Monte Carlo calculations can be fun (if you like that sort of stuff) and not particularly difficult to program. So CooperToons decided to investigate the question of Joe's hitting streak. Instead of writing a sophisticated program like the Cornell scientists did, CooperToons wrote a program that simply determined how many games you have to play in to hit a streak of any given length. The results are most interesting on one hand and rather mundane on the other.

Lou Gehrig

Lou Gehrig
The Iron Horse

So we're going to let both Joe and Willie step up to bat - millions of times. When we get to the end of a hitting streak equal or greater than 56 we'll record how many at bats they had. Then we'll let them keep going and with each hittng streak of 56 games more more we'll write down number of games they had played before the streak was broken. When we're done we'll count the number of streaks.

First, the most important variables are - not surprise - the batting percentage (PCT) and the number of times you bat per game (AB). But what is a critical factor in understanding how anyone can have a Joe Dimaggio-type hitting streak is that the probability of having a hitting streak is how sensitive the length of the streak is to the variables. That is, a small difference in the percentage of at bats can make a big difference in how many successive games you can have with at least one hit.

For instance, if you take Joe's overall batting average for 1941 (when he had his 56 game streak) of 0.357 and his 3.89 at bats, the computer found something very disturbing. It seems Joe would never have hit a streak longer than 49 games. In millions and millions of seasons, Joe never hit a 56 game streak. It wasn't until we let Joe hit with a 0.408 average and 3.98 times at bat - his hitting statistics during the streak - that he began having some seasons with hitting streaks of 56 games. Willie, though, with his 0.424 average and 4.37 times at bat in 1897 would have long hitting streaks - one was even 80 games.

Joe Dimaggio - Willie Keeler Hitting Streaks

Joe and Willie at Bat
(Click to Expand in New Window.)

Well, let's see what would happen if we let Joe and Willie step to bat. If you click on the image about the hitting streak odds you'll see how many games the players had to play to hit 56 consecutive games. Willie should have had a much better chance of hitting the streak than Joe - and he does. The Cornell scientists agree with that. They found Willie was the #1 most likely player to hit the streak. Joe ranking was coincidentally #56.

Something was going on. You can easily say someone should have hit 56 consecutive games, but why was one of the least likely players the one who actually ended up holding the record? This, in turn implies that Joe's streak was not due to random chance. Remember, statistics identifies that an event is not due to chance, but it does not identify the cause itself. That requires a human brain to go back and review the data and find causes other than random variability that can cause the observed event. Were there any specific causes - not necessarily Joe's hitting - that could have helped him maintain his streak?

There were indeed three non-random events during Joe's 56 games that could have given Joe the advantage, all which may have been influenced by the fact that people knew Joe was pushing for the new record. In two games, Joe got on base but with plays that could have been - some say should have been - ruled errors on the part of the fielder. If you get on base due to an error, the plays are not normally have been counted as hits. Some people think it is possible the official scorer - who like everyone else - was hoping Joe would keep going - (perhaps unconsciously) had favored Joe. If that was the case then the streak was not due to chance, yes, but the non-chance event was not Joe's ability as a hitter.

There was also one game where Joe had no hits and the last inning was in progress. The Yanks only needed one run to close out the game. The batter realized if he belted in a double - which was possible - Joe would not get his chance to bat and his streak (which was 44 games) would be over. Could he bunt, the batter asked the coach, and the coach said yes. He bunted, got on base, and then Joe came up and smacked in a run to keep the streak going. Would Joe have gotten a hit without the decision to bunt? Who knows? But certainly a non-random event at least gave Joe a chance to keep the streak going.

So what do we conclude? Joe should not have made the streak - but he did. True he probably got some breaks, but he may have been on the receiving end of bad calls as well. Finally there's one more thing to consider. Joe's 56 game streak wasn't the only hitting record Joe made. In 1933, while still in the minors, Joe hit in 61 consecutive games. That was for the #2 spot behind Joe Wilhoit who had a streak of 69 games playing for the Wichita Jobbers in 1919.

So we pose another question for the computer. What are odds that a player will hit at least the #1 and #2 record for consecutive games hit in the majors and the minors. Is that likely due to chance?

After all, we are talking about Joltin' Joe here. Not Mr. Coffee.

References.

"A Journey to Baseball's Alternate Universe" Samuel Arbesman and Steven H. Strogatz, the New York Times, March 30, 2008.

"A Monte Carlo Approach to Joe DiMaggio and Streaks in Baseball", Samuel Arbesman and Steven H. Strogatz, Unpublished.

Baseball-Reference.com, http://www.baseball-reference.com/ Baseball statistics presented in a way that statistics freaks can enjoy.

"Calculating the odds: DiMaggio's 56-game hitting streak", Peter Goodrich, AllBusiness.com, http://www.allbusiness.com/health-care-social-assistance/social-assistance-individual/236667-1.html. Despite the annoying flashing and walk-across-the-screen-while-you-try-to-read ads, this article shows how difficult it is to give an analytical answer to the probability of hitting at least once in a given number of consecutive games.

"Does Joe DiMaggio's Streak Deserve an Asterisk? Report Suggests Slugger May Have Gained From Yankees' Relationship With Official Scorerd", John Allan Paulos, http://abcnews.go.com/Technology/WhosCounting/story?id=3694104&page=1. Temple Mathematician discusss the odds that Joe could get the streak.