The Wheelhouse: June 2016

Wednesday, June 29, 2016

Gibson '68 versus Gooden '85 and the Double Data Type

In our last post, we calculated Clayton Kershaw's career WHIP using the Float data type. In that post I menionted that the Float data type is particularly good at handling "short" decimals, positive or negative.

What's a "short" decimal? For Swift, it is one that handles up to 14 places past the decimal point. In baseball terms, Floats are handy for calculating Kershaw's WHIP, Bob Gibson's ERA, Tony Gwynn's batting average, or Kevin Youkilis' OBP. Floats are good at handling numbers of that kind of length.

Unfortunately, Floats are not good at handling larger, finer numbers. Even more unfortunate, it's rare to find these kinds of numbers in baseball which rarely goes four places past the decimal point. How large and fine am I talking? I'm talking numbers like:

3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724587006606315588174881520920962829254091715364367892590360011330530548820466521384146951941511609433057270365759591953092186117

Doubles can handle numbers that large and fine much better than the Float data type can. In fact, Swift prefers if we assign our fractions and decimals to Double and does so by default if we do not assign them. Why? Because Doubles can handle a larger and finer numerical output they are also a safer way to deal with numbers. It's like saying, we know Billy Hamilton can steal bases, but if Game 7 of the World Series depended on swiping second, we might go with Alexi Casilla instead (87.917% success rate versus Hamilton's 80.110% good for #1 all-time).

What do I mean by safe? Remember when we talked about uncertain numerical outcomes? Same idea. If we assign two numbers to be Floats, divide them, and the result turns out to be a Double, the program will not run, and then we're in trouble.

Let's play with a couple Doubles, in this case Bob Gibson's 1968 ERA of 1.12 and Dwight Gooden's 1985 ERA of 1.53. How much better was Gibson's ERA? Let's take a look by fully extending the ERAs.

As we can see, Gibson's ERA in 1968 was 36.19866835764088% better than Doc's in 1985 which is pretty impressive considering how microscopic their ERAs were for those years. Now, if we had assigned both to be Floats how would that look?

Ah ha! See! The Float result, 0.3619866, isn't as precise as the Double result, 0.36119866835764088 because Floats cannot handle such large and fine numbers. Score one for Doubles!

Doubles score another win in our favor when Swift assigns fractions and decimals to the Double data type by default if we do not assign them in advance. Here's an example:

If we take a closer look, we can see that because we have not assigned any specific data types to either bobGibson1968Innings or bobGibson1968EarnedRuns, either Float or Double, that Swift has assigned both to be Doubles:

You'll notice as you learn to code that like shedding training wheels, you'll shed assigning specific data types. How can you get away with this? Like in the example directly above, Swift covers your butt by either assigning decimals to be Doubles by default or by recognizing Strings as anything between two quotation marks or integers as whole numbers. This is helpful as it keeps your programs running rather than stopping to ask what data types these are and gumming up the works.

On Deck: Who's On First With The Boolean Data Type

Wednesday, June 22, 2016

Clayton Kershaw, Floats, and the Incredible Shrinking WHIP

In our last post, we talked about the Swift integer or Int data type and ended up with a glaring problem: We couldn't use Int data types to figure out Cy Young's WHIP. Namely, because the result isn't an integer or whole number, it's a decimal.

Unfortunately, Swift doesn't have a decimal data type. Instead, it has Float and Double data types. Today we'll go over Floats and we'll give calculating WHIP another shot this time using Clayton Kershaw as our example.

What is a Float?
A Float is a data type like string and Int that accounts for decimals and fractions up to 14 places past the decimal point.

Why are Floats important?
As we saw when trying to calculate Cy Young's WHIP, Floats are important because they help us work with numbers that aren't integers or whole numbers, but rather numbers that are fractions or decimals.

When do we use Floats?
We typically assign Float data types to variables and constants we know will require short decimals. In baseball, this includes batting average, on-base percentage, slugging percentage, OPS, WHIP, ERA, innings pitched, and fielding percentage to name the biggies. Here are some examples:

Baseball stats rarely pass the decimal point by more than four places as seen above. This is perfect for Floats which are best for decimals of up to 14 places.

Another good reason to use Floats is to account for uncertain numerical outcomes. Classic cases of uncertain numerical outcomes occur when the output of a function is unknown going into the problem. For example, let's say Kershaw pitched a complete game (nine innings) giving up eight hits and one walk. This gives us a WHIP of 1 (8+1/9 = 1). We could have used the Int data type here because we had all whole numbers (9 innings, 8 hits, 1 walk, 1 WHIP).

But what are the chances over the course of an entire season or career that Kershaw's WHIP will always be an integer? He may throw a couple no-hitters (0 WHIP), have a bunch of solid one WHIP games, and maybe some bad 2 WHIP games, but chances are his WHIP will be a Float. Because this is a highly likely outcome it makes sense to assign the Float data type to associated variables and constants early on so they can account for these uncertainties later on.

So let's try calculating Kershaw's career WHIP (as of 6/19/16) using Floats:

Maybe you're thinking, wait, hits allowed and walks allowed will always be Ints! In the same way that strings and Ints don't play well together, neither do Ints and Floats.

There is a way to keep hits and walks allowed as Ints called type casting. We'll get to it down the road.

On Deck: Doubles

Wednesday, June 15, 2016

Cy Young, a Man of Many Integers

In the last post, we talked about how there are different types of ballplayers (power hitters, contact hitters, power pitchers, junkballers, etc) and how there are different types of Swift data types, specifically the string (anything between quotation marks), and how both types are good at certain things and not even considered for other things.

Today, we'll go over another Swift data type, the Integer or Int.

What is an Int?
If a string is anything between two quotation marks, then an integer is any whole number (e.g., 2, 3, 19, 1002, -49275020457). In baseball terms, integers appear as wins, losses, home runs, hits, stolen bases, and saves to name a few instances.

Why are Ints important?
Declaring a variable (var ichiroUsHitTotal = 2974) or a constant (let ruth = 3) to have an integer data type gives the rest of your function or code a clear heads up that that variable or constant is an integer and as an integer can do certain things and cannot do other things.

For example, let cyYoungCareerWins = 511. In this case, Cy Young's career win total is 511 and because he's retired (and, well, deceased) that is a constant which means his career win total will never change. Because we know his career win total is an integer (511) we can now subtract from it another integer (316, his career loss total) to figure out how many games over. 500 he was for his career.

What can integers not do? As we learned in the post on strings, we cannot add strings and integers together. They're like oil and water, Earl Weaver and umpires, or Jon Lester and throwing over to first base.

When do we use Ints?
We use Ints when calculating things, either through addition, subtraction, multiplication, or division. The thing about integers is, we need to know in advance that the final answer will also be an Integer. Why? Good question. Say you wanted to calculate Cy Young's lifetime winning percentage:

Dude won 511 games. His lifetime winning percentage can't be 0! Of course not. It's 0.61789600967352. Unfortunately, 0.61789600967352 is not an integer. In terms of Swift, it's a Float and in this particular case because it is less than one, it is zero.

Here's another example to keep an eye out for. While Young had six seasons with a WHIP less than 1.000, let's take a look at his 1892 campaign with the Cleveland Indians when he had a WHIP of 1.062. Here it goes:

Again, because we're dealing with declared integers and the answer isn't an integer the computer doesn't budge. Instead, it comes out with the closest Integer, in this case one. On top of that, the computer rounds down. Even if Cy had walked 500 batters that year for an eye-watering 1.905 WHIP, the computer would tell you he had a WHIP of 1.000.

So what do we need to do to properly calculate things like WHIP, ERA, or OBP? We need Floats.

On deck: Calculating Clayton Kershaw's career WHIP with Floats.

Wednesday, June 8, 2016

What Makes "Ripken", "4,256", and ".406" strings?

In earlier posts we assigned "Ripken" to the constant battingThird and assigned "3B" to the variable oquendo. Why did we put Ripken and 3B in quotations? By putting them in quotations we turned them into strings. But what are strings?

In baseball, there are different types of players. There are contact hitters, power hitters, base stealers, power pitchers, junkballers, pinch hitters, and the guy on the bench who gives rookies hotfeet.

Contact hitters will hit at the top or bottom of the order. Power hitters will hit in the middle of the order. People who are not base stealers are not going to steal bases. Pinch hitters aren't going to play everyday and the guy on the bench isn't on the bench because he gives good hotfeet.

In other words, each of these types of players have specific roles.

Similarly, Swift has different data types that play specific roles. Today, we'll go over the string data type. In short, anything between two quotation marks is a string. Here are some examples:

"Ripken"
"The Giants win the pennant!"
"4256"
".406"
"%"

They are all strings because they are all surrounded by quotation marks. As you can see, a word can be a string. A sentence can be a string. A whole number or integer can be a string. A decimal or float can be a string. And a symbol can be a string. As long as whatever is written is between quotation marks, it is a string.

Sometimes you'll even see something like this:

var musialBattingAverage = ""

This is an example of an empty string. Why would someone write an empty string? Maybe there is a baseball app that requires the user to pick a player and a year before returning the player's batting average for that year. If, for instance, the user chooses Stan Musial and 1943, the value of "" will then change to ".357", Musial's batting average that year. If the user chooses Musial and his last year, 1963, it will change to ".255". musialBattingAverage is, after all, a variable and can change.

Why are strings important?
Strings help users of apps input information, edit information, and update information. If you are on baseball-reference.com, and you want to look up the career stats for Yogi Berra you can type in Yogi Berra in the search bar and baseball-reference.com will return Yogi's career stats for you. Here is how baseball-reference how they process these search queries:

In other words, if you tried to search using other data types such as integers (e.g., 20 (most strikeouts in a game), 287 (Hughie Jennings' record for most career HBPs), 1,406 (Rickey Henderson's record for stolen bases)), floats (.406 (Ted Williams' batting average in 1941), .215 (Mario Mendoza's lifetime batting average, believe it or not), 1.3791 (Babe Ruth's highest single season OPS)) or Booleans (values that are either true or false, e.g., Did Trevor Hoffman play for the Padres? True. Did Don Mattingly manage the Yankees? False.) the baseball-reference.com search engine would show you this:

What can you do with strings?
Good question! You can concatenate or add strings together like this:

When do you use strings?
I Googled "When do you use strings?" and I got nothing but advice on guitars. My ad hoc answer is we use strings to print out words and other values to the screen for users to view.

What can you not do with strings?
In the same way that a manager should not bat Mario Mendoza cleanup or ask 2012 Jamie Moyer to throw 98 MPH heat, there are things Swift strings cannot do.

As we saw above, you can add strings to other strings ("Derek" + "Jeter" = "Derek Jeter"), but you cannot add strings to other data types such as integers (the number 10), floats (.406), or Booleans (values that return either true or false and nothing else). If you try to add a string to another data type in Swift you'll get this error:

Rather, you'll need to cast or convert the 10 integer into a string. Here's what that looks like:

If you want clean this up a bit, you can add space between Rizzuto and 10 by doing this:

That's a lot of code for today so let's end with this:

On Deck: Cy Young, a Man of Many Integers

Wednesday, June 1, 2016

Jose Oquendo & Variables

In the history of major league baseball, only four men have played all nine positions in one game. This does not, of course, include Will Ferrell's even more impressive feat of playing for ten different teams in one day. But when I think of utility players the first that comes to mind is Jose Oquendo.

Oquendo played twelve seasons, the first two with the Mets, and the rest with the Cardinals from 1983-1995. In 1988, Whitey Herzog penciled Jose to play second in 69 games, third in 47, short in 17, first in 16, right in 9, center in four, two in left, to catch one and to pitch one.

If Cal Ripken was a constant in the Orioles' batting order, Oquendo's defensive positions were always variable. Here's what Oquendo's defensive assignments for May of 1988 looked like:

How do we represent all of Oquendo's different positions in Swift? Here's how:

Unlike when we tried to move "ripken" out of batting third, there are no errors when changing Oquendo's variables.

I like to imagine that Oquendo had several different gloves and when called upon to play a different position he grabbed the appropriate glove. The same applies when working with variables which are declared once with the keyword "var" and then re-assigned new value after the assignment operator (=).

When Whitey assigned first base to "oquendo", Oquendo grabbed a first baseman's glove. When assigned to shortstop, Oquendo grabbed an infielder's glove. When Herzog assigned a different position to Oquendo, Oquendo dropped the old glove (in this case, the position value) and picked up the new one.

The same happens with the values of a variable. In this case, the variable "oquendo" has values such as "2B", "SS", "3B", "CF", and "1B". Variables do not hold onto their values the way the constant "ripken" held onto the third spot in the batting order. Rather, a variable grabs its new value and goes with it until a new value, or in Oquendo's case a new position is assigned to him, and it needs to grab a different value (and Oquendo needs to grab a different glove).

Variables are important because they help the Whitey Herzogs of the world use the Jose Oquendos as circumstances change. Ozzie Smith needs a day off from shortstop? We'll put in Jose. Willie McGee's not feeling well. We can stick Jose in centerfield. Things change. Variables allow for their values to change.

When do we use variables? We use variables when we think the value of something will change. Think of Joey Votto's batting average, Jose Iglesias' fielding percentage, Clayton Kershaw's ERA, Jake Arrieta's WHIP, or Ichiro's hit total. They are going to change over the course of a season. They are variables.

On Deck: What Makes "ripken", "4,256", and ".406" Strings?