Growing Data: March 2012

Tuesday, March 27, 2012

When Yes and No are Not Enough

Imagine a world where all the people inhabiting it viewed everything as “black-or-white”. A person is very satisfied with life until, at some point, they became instantly very unsatisfied. Children in school either have no knowledge over a subject or they know everything about it. And neighborhoods are either so safe that the residents have never even heard of a lock before or are so dangerous that armed guards must be hired each time a trip to the store is needed. Indeed this would be a strange world to live in, so it may be surprising that often times the way we collect data makes it appear that we live in this sort of world.

Data of this type is called “binary” meaning only a yes or no type answer is recorded. While this sort of data is useful for questions like: “Do you live in Kansas?”, “Did you vote for Bayes last election?”, or “Are you male or female?”, it is not as appropriate when there are varying degrees for the answer such as; “Where you satisfied with the seminar?”. Instead we should use a rating scale when we want to collect this type of data. Let's take a look at an example to see why this is.

Say we are presenting a seminar on technology and we want to assess whether the participants have learned about using Excel. We will ask the people taking the class two questions before and after the seminar to measure the amount they learned. The questions will be “Could you use Excel if needed?” and “On a scale of 1-10 rate how well could you use Excel if needed? (10 being better)”. After the seminar we take a look at our results (it seems the seminar turnout was poor since there was only three participants).

Now looking at just the yes/no responses we may be disheartened -only one person improved from a No to a Yes, and it also seems that the class wasn't useful to another since they already could use Excel. However, looking at the rating scale we could see that in actuality the seminar was a success, all of the participants learned, with an average learning of 3 on the scale. The seminar was the same, yet just because of the way we collected our data our conclusions could be quite different.

So why is a rating scale better for data like this? It is because it allows us to measure slight changes in the data easier. It is relatively difficult to switch a response from a No to a Yes, but much easier to move a rating up or down one or two points. Further, we could always convert our rating scale data back to a binary type by grouping the responses (say 1-5 = No and 6-10 = Yes), but we can not turn binary data into rating data.

When designing surveys or intake data, take a minute to think about what kind of data you are collecting, and consider if a rating scale can reasonably be used. The extra “shades of gray” that you will find in your data will make it easier to keep you pointed towards the truth.

Wednesday, March 7, 2012

Switch now and save!

It seems that every time I turn on the TV there is some insurance company that is saying something to the effect of “People who switch from insurance from Company A to Company B save on average $500 a year for the same coverage.” but then the next ad is “People who switch from insurance Company B to Company A save on average $500 a year for the same coverage.” How is this possible? Does this mean that if everyone switched insurance companies that everyone's rates would go down? The truth is sadly no (if it were true I would just switch companies enough times that I would get free insurance), but interestingly the ads are correct in their statements. So what is going on?

First let's take a hypothetical sample:

Each shaded square is the quote from the insurance company that each person is currently using, while the unshaded squares are the quoted prices for the same coverage with the other company. So Person 1 is paying Company A $1,320 for insurance while he could get the same policy from Company 2 for $845.

The first thing to note is that the rates for each company are the same on average ($1455.88 for Company A and B). So the insurance companies offer the same coverage for the same price overall, and yet they both make the correct claim of “people who switch from them to us save $500”.

Next let's just see what happens if everyone switches policies:

Here are the savings if everyone switched insurance policies. Now we see something strange; on average if everyone were to switch polices then each person would expect to save -$500 on their insurance, or everyone would pay $500 more overall. This is quite the opposite of what the ads seem to claim. So were are the savings?

Well we need to pay close attention to the wording in the insurance companies statement “Those who switch save on average $500”. Now who would switch? It will only be the people that have quotes that are lower (and arguably significantly lower) than their current rate. So obviously the average rate for any switch will save money.

So if we go back to our data now and have the people switch that received lower quotes we see:

Where the light green cells are customers who switched and the dark green are the customers who stayed with their old policy. Now when we calculate the money saved from those who switched we see an average savings of $500 for each customer overall, no matter which policy they had and switched to. Case closed.

This insurance example highlights a common problem that we have interpreting numerical results. Statements are crafted in ways which are true but could be misleading (even if they are not meant to be). It is the job of the person who reports the results to be as clear as possible, but it is also the responsibility of the people who rely on results to make sure they fully understand the statements. Comments such as “the samples were randomly selected”, “outliers were thrown out”, or “out of our initial testing cohort these three results were found to be highly significant (p-value<0.01)”, seem to be run of the mill, but without fully understanding what actually happened they may cause the analysis to be of little practical use. When something doesn't make sense, or is not spelled out clearly, we need to be sure to ask questions so that we can keep ourselves pointed toward the truth.