I’ve been known to get a little salty about how NBA fans, especially those of the Twitter variety, talk about and use statistics. I am by no means a statistics expert, but I’ve studied it enough to know that often what we refer to as “statistics”, as basketball fans, is just arithmetic. But that is a case of no harm no foul. Really, we are just being a bit conspicuous with our use of numbers, so we try and give it a more impressive name.
However, this point of the season always brings up a pet peeve of mine. Normally, I would do breathing exercises or count to ten, but I thought a more productive thing to do would be to write about it here for all of you intelligent and clever fans. If you follow me on Twitter, you’ve probably guessed I’m talking about the common usage of “small sample size” among NBA fans.
In short, I think all of people are using it incorrectly. When I see someone say something along the lines of, “Well, yeah but small sample size”. Often, what they are actually saying is we have seen a trend over a few games, and I don’t expect it to continue for much longer. Sometimes, that’s fine, but it can also be a blatant misuse of the concept. The other thing that bothers me is that it becomes a conditional statement. That is, people use it when it helps a point they are trying to make, but suddenly forget it when it damages their argument. Nope. When sample sizes are small, that is a statistical and mathematical reality that affects any estimation you want to make. There is no way around this. We can only say so much when we have fewer than 20-30 observations.
Obviously, we see these types of statement a lot in October and November, because every sample size is small. We’ve only played a handful of games. The truly relevant use of small sample size for statisticians is not inherently about something not meeting your expectations. It is a bit more nuanced. Let me explain by talking about Tyreke Evans (bet, you thought it was going to be another guard, right?).
Here’s the thing. In eight games this season, Tyreke has been really bad at finishing at the rim. He is only shooting 43.7% within 3 feet. For his career, Tyreke has been a good to very good finisher with last year being his worst season when he finished at a rate of 54.4%. His best season was 2011-12 (65.4%). The chart below shows his shooting percentages at the rim over his career.
This is a pretty great case study of small samples. Basically, we have seen Tyreke shoot poorly at the rim. However, we haven’t seen a lot. Could this just be a fluke, or is something going wrong? Well, with that type of uncertainty, statistics sure can help.
First I want to know, how strange is what we’ve seen. In other words, given the number of shots he has taken this year, how likely is it that he would hit that many or fewer? We can use something called a binomial distribution to get an answer to this type of question. You can do some googling and find some clear explanations of what a binomial distribution is, so I won’t repeat them here. To give you a basic idea, a binomial distribution looks at cases where you either have a success or failure. Think about flipping a coin. It either comes up heads or tails. Then, it asks okay how many times did you flip it and how many times did you get heads? Once you give that it that information and the probability of a success vs. failure (50-50 for the coin example), it tells you here is the expected probability of you getting that many heads for the number of times you flipped.
Before I get to Tyreke I should say something about the assumptions of a binomial distribution. A binomial distribution assumes that each trial is independent. That is to say, a previous outcome will not affect future trials. In Tyreke’s case, missing or making a shot will not affect the probability the next shot goes in. This might be a little reductive. A player might have more confidence after making a shot, which may lead to a higher probability he makes another. Having said that, it seems reasonable to suggest that shots are closer to independent than dependent. That is why we see great shooters consistently shoot around the same percentage for a decade.
Anyway, for Tyreke, I said, he has shot 71 times at the rim this season (flips of a coin). He has made that shot 26 times (number of heads). Given his worst season rate at the rim, 54.4%, what is the probability that Tyreke would make 31 or fewer shots? In our coin example, I would have used 50% for a fair coin. However, I have to input a value for something like layups. I choose his worst season percentage to prevent any screams of bias. The result? The probability that Tyreke would make 31 or fewer layups on 71 attempts is .0451 or 4.51%. (Note: I’m using the cumulative density function, which adds all the probabilities from 0 to 31 shots. It didn’t make a significant difference either way, but I wanted to know what the odds were for him doing this well or worse.)
That’s low. Like really low. We would have expected, given his percentage during his worst season, that Tyreke would have hit at least a few more layups. Here is where we can say something about small sample sizes. Tyreke has taken 71 layups this year. Last season, he took over 500, which was the most he ever took in a season. The fewest layups he ever took in a season was 287 in 2010-2011, but he only appeared in 53 games. He usually plays about 66 games a year. In an average year, he takes a little fewer than 400 layups. So, if this year is typical than Tyreke has already taken somewhere between 14% and 19% of his total layups for the season.
Why does this matter? Return to our coin example. Let’s say I ask you to flip a coin with a 50-50 probability of heads or tails 10 times. You get heads 7 times. By pure intuition alone, if I asked you how crazy this result was, you’d probably say, “Meh, its more heads than I expected, but I’m not shocked.” Now, let’s do the same thing, but this time you flip it 1,000 times. If you got 700 heads, you’d be pretty shocked, right? In the first case, you thought that if you kept flipping you’d eventually get some tails in there. This is one of the problems with small samples. We don’t have enough data to make any conclusive statements about the population mean (i.e. what we are interested in measuring).
I want to talk about one more thing related to small sample sizes. I’ve been working with this data since about game 5 of the season. One thing, I’ve noticed is that the Tyreke’s shooting percentage has increased dramatically from then to now. He has increases from shooting 39% at the rim to about 43.7% at the rim, in only four games. The table below summarizes his totals and marginal increases from the San Antonio game until now.
What’s the point? Well, look at how adding just a few shots over a small number of games dramatically increased Tyreke’s overall layup percentage. When sample sizes are small, things like field goal percentage are very sensitive to adding another observation. Once our number of observations increases to a relatively large number, Tyreke taking and making that next layup won’t really affect his overall percentage. Right now we are so early in the season that a couple of good games can completely change your assessment. That is really the point here. When you draw a conclusion from a small sample size, you could be right or wrong. Tyreke may have a bad season shooting at the rim this year. That is entirely possible. The real point isn’t wether your assessment is right or wrong. The point is that small sample sizes impose some serious limitations, as we have seen from looking at these numbers. In short, the only real thing we can say, is that we can say much. Small sample sizes are just are hard and unavoidable truth of statistical inquiry.
I could leave it at that, but the social scientist in me won’t allow it. One of the major factors separating the social sciences from the natural sciences is the laboratory. In a lab, you take the one variable you’re interested in, and you say I want to measure this holding everything else constant. See why it would be hard for an economist, who is interested in measuring the affect of education on wage, to do that? He can’t just go around assigning a certain number of years of education to 1000’s of children at random. He doesn’t have a laboratory.
We also don’t have a basketball laboratory. There are a lot of factors going on at once. Some we can measure. Even more we can’t, and we certainly can’t hold every other factor constant. Still, when we see something like Tyreke’s poor layup percentage, it is fair to ask why. Why has he shot so poorly? Of course, we already talked about small sample sizes, but what was the cause? Often, it is injuries, but it can also be things like adjusting to a new system, changes to your play style or new teammates, or even the opponents you are playing.
I could go more in depth here and give my own opinions, but I’ve been going on for a while. Let’s just leave it at this. A relatively small number of observations make it difficult to draw conclusions in any direction. We haven’t seen enough games or lay up attempts to know if something is really wrong with Tyreke. His percentage could drop for any number of reasons, but we won’t know more until we approach that magical 20-30 game mark. Then we can say more, until now it is all conjecture in the small sample size theater.
10 responses to “Tyreke Evans and Small Sample Sizes”
What are the odds given his past 3 point shooting that he would currently be shooting above 50%? I want to believe so bad that it will stay this way forever. I’m not expecting him to shoot above 50% but the low 40’s high 30’s would be nice.
jmsunseri Ha, so I used his best 3 point percentage in a season (33.8%) to answer your question using the same methods I used in the article. If my quick calculations are correct the probability of him shooting as well or better than he has this season, is far below 1%.
So, we’ve been pretty lucky with his 3 point shooting so far.
However, small sample sizes apply here too. He will almost certainly regress, but his shot looks better than it ever has. That is an example of a factor that isn’t measure by data. That is, his shot form looks better to me. He is squaring up to the basket and shooting with one hand.
Teams will adjust to his shot, and he won’t get as many open looks. Still, I wouldn’t be surprised if he shoots much better this season from beyond the arch than last. Now, last year he shot 22%… So anything in the 30’s for an entire year would be a dramatic improvement.
Nicks65 if people actually have to guard him closer to prevent him from taking that shot 1) we create spacing for the entire offense 2) he should be able to penetrate more frequently. I wonder if there would be a positive correlation between hitting 3 pointers and finishing at the rim.
jmsunseri Nicks65 Yeah, I mean another thing that I think you’re suggesting is that as scouting reports change, teams may try and chase him off the 3 point line when he is open. That should open up opportunities for Tyreke to get past his man and get better shots at the rim.
I think that is a fair, but we will have to wait, hope, and see.
How about these two different small sample size effects with Tyreke?
Take away the Cleveland game and Tyreke’s FG percentage goes from 40.5% to 43.5% for the 7 games in November. Why? With so few games played to spread a bad game (2-12) FG shooting across, one bad game weighs down his average due to a small sample size.
On the other hand, there is Tyreke’s shooting from 3 point land so far for the year. He has shot 50% or better in every game but two, the Cleveland game and the Orlando game. In each of those two games, however, he only went 0-1 from 3 point land. That’s not really much of a “bad game”. So, with a small sample size, Tyreke’s 3 point shooting for the year has been very consistent and it looks very good.
504ever Yup. Pretty much what we saw with games 5-8. When sample sizes are small, adding just a few observations can dramatically change your result. We need larger numbers before everything smooths out.
Love your statistics. Loved the course in college. The good news is he should get his percentage over 54% with a couple of 80% at the rim games. Lets re-visit his rim shots after 21 games.(25.92 % of the regular season) I confident he will improve. The shocker is his 50% from behind the 3 point line, small sample size or not. He didn’t get 50% from the 3 or and the 2 last year, ever !! Keep up the good work.
Austin was real bad at the rim last year.
kfte I don’t know about real bad. He shot a little over 53% last year, which was about Tyreke’s level, and it was higher than what Damian Lillard shot at the rim last season. He wasn’t good, but wasn’t terrible either.
I might argue that the NBA _is_ a basketball laboratory. Great article, thanks.