Welcome to Ellipsix Informatics!
My name is David Zaslavsky and this is my website. I'm a graduate student in physics, and I also do a lot of work with computer programming.
I filed 20 postdoc applications this weekend.
Estimated number of offers: 0.1. That's right, with 22 applications in, I have a one-in-ten chance that I might hear back from one of them.
Real life is no fun.
Anyway, now that my big batch of postdoc applications is filed, I'll get back to the posts I meant to write last month, still hoping to complete the 30 posts I wanted to write in November by the end of this year. There is still lots of interesting physics to cover!
Good news, everyone! Well, good news for me at least: I've been granted a spot at the 2014 Science Online Together conference!
Science Online is an organization that, well, like the name suggests, supports people who promote and develop scientific content on the internet. They manage the Science Seeker blog aggregator and hold several annual conferences to bring together people involved in science online in all capacities. The flagship conference is always held near the beginning of the year in Raleigh, NC, and this time I get to go!
This is good news for you too, though. When I'm not busy conferencing I'll be uploading blog posts and tweeting, so that everyone else can share the experience as much as possible. Stay tuned for that as the conference is running, February 27 to March 1. For now, if you're interested in such things, conference news (and griping about the cost) is flowing under the #scio14 tag on Twitter.
At the beginning of this month, you may remember, I set out to write 30 blog posts in 30 days. Well, there are four days left, and I'm barely a third of the way to my target of blog posts. It turns out that applying for postdoc positions will take up all your time, and then some, leaving precious little for blogging. Which is kind of a shame, because I had some good sciencey posts lined up.
Most of my postdoc application deadlines are coming up this week or early next week, so I have to prioritize those for now. To make it up to you, my reader(s), once I'm done with applications I'll keep going with all the posts I had wanted to write this month. With any luck, I can crank out all 30 by the end of December.
When I arrived in Princeton last Friday, I was greeted with this headline:
Emergency meningitis vaccine will be imported to halt Ivy League outbreak
Emergency doses of a meningitis vaccine not approved for use in the U.S. may soon be on the way to Princeton University to halt an outbreak of the potentially deadly infection that has sickened seven students since March.
Well then. Perfect timing. But seriously, it is actually a perfect time to reflect on why vaccines are necessary in the first place. And it's not (just) for the reason you might think.
If you're vaccinated against a disease, not only does it mean that you won't get sick, it also means that you won't pass that disease on to other people. Vaccinations protect the people around you too. And conversely, even if you're not vaccinated yourself, the more people around you who are, the lower your chances of catching the disease from someone else.
Let me illustrate this with a simple model of how a disease spreads. Imagine a world where people live in apartments on a perfect grid and only ever talk to their four neighbors, once a day.
Suppose one of these people gets sick.
Each time that person talks to their neighbor, they have some chance to pass on the disease. Maybe each day that they're sick, the person has a 20% chance to infect each neighbor. (That's a pretty high chance, but then again in reality most of us talk to a lot more than four people each day.) If the illness lasts three days, the odds are pretty good that one or two neighbors are going to get infected.
(light red squares represent people who have just gotten sick, dark red represent those who have been sick for a while)
And then those neighbors infect their neighbors. The four neighbors of Patient Zero (the first person to get sick) have eight new neighbors of their own, so there are even more opportunities for the infection to spread.
(blue squares represent people who are recovering)
And then those eight neighbors have 12 new neighbors, and those 12 have 16, and so on. Eventually you wind up with a wave of infection spreading back and forth through the population.
What if some of these people are vaccinated?
Vaccines don't work on everyone in real life, but to keep my example simple, I'll pretend vaccinated people are totally immune to the disease. They'll be represented by green squares. Look what happens if even 10% of the people in this imaginary grid city are vaccinated:
See the difference? The infection finds it a lot harder to spread. Instead of disease running rampant throughout the population, it's limited to one little cluster in a corner of the grid. The few people who are vaccinated act as a protective barrier of sorts — maybe one with a few holes in it, but still, enough to significantly hold back the spread of the disease and keep the unvaccinated 90% a lot safer.
Say we increase that to 20% of people getting vaccinated. Here's what that looks like:
Now how about that! The disease just sputters out without ever reaching most of the population!
Maybe that was just a fluke. Let's try again and let the randomness play out differently:
Same thing. One more:
Yet again, the infection disappears after just a few steps of the simulation.
Wait — I lied. Watching these little colored squares is too entertaining. This is the last one I swear:
It takes a little longer this time, but still, the infection disappears.
Evidently, you only need some critical fraction of the population to be vaccinated to stop a communicable disease in its tracks! This phenomenon is called herd immunity.
The critical fraction of vaccinated people you need to produce herd immunity depends on how people in the population interact with each other. In my toy example, there's very little interaction — each person only interacts with four neighbors, once a day. That makes the critical fraction fairly low; just 20% vaccinated is enough to stop the disease after a fairly short time.
In real life, people interact a lot more, which means the critical fraction is higher: 80%, 90%, 95%, or more. A community can only afford to leave a few people unvaccinated without spoiling the herd immunity. It's important to leave those "slots" for those few people who have legitimate medical reasons not to take a vaccine, so that they can be protected by the rest of us.
For the curious, here is the (completely unpolished) Python code I used to create the pictures.
Evidently my post from a week ago on the rate of fires in Tesla electric cars compared to gas cars couldn't have come at a more appropriate time. People are still harping on the recent string of Tesla Model S fires, despite the fact that — as I showed in my last post — there's no evidence to suggest that the fire risk in a Tesla is any greater than that of a regular car. In fact, if anything it seems to be slightly less.
In my last post I kind of hinted at the fact that the rate of fires isn't the whole story. Even if a fire does happen, your risk of getting injured or killed is different in a Tesla than a normal car. Something similar goes for other types of accidents. So if you want to tell whether Teslas are safe, what you probably should be looking at is the overall rate of injuries and fatalities for Tesla drivers and passengers, compared to the equivalent for gas cars. And that number tells a very interesting story: Tesla CEO Elon Musk has written a new blog post which emphasizes that not one person has ever been killed or seriously injured while driving or riding in a Tesla!
It's an impressive record, but as Musk admits,
Of course, at some point, the law of large numbers dictates that this, too, will change, but the record is long enough already for us to be extremely proud of this achievement.
Well, I happen to know a thing or two about the law of large numbers (and so do you, if you read my earlier post). Let's see what we can tell from the fact that Teslas have been driven as much as they have without a serious accident. Just how proud should Tesla Motors be about this?
Here I'm just going to apply the same statistical methods from my other post, except now focusing on the number of deaths and serious injuries, instead of the number of fires. So now designates the true, underlying probability per mile of a driver or passenger of a Tesla being seriously injured or killed for any reason related to the car.
As before, we don't know , but we do know that the corresponding event (death or serious injury) has happened zero times over a hundred million miles driven, . If you followed my last post, then, you'll understand how the maximum likelihood estimate (the "best guess") of is just zero — that Teslas protect you perfectly from traffic-related injuries.
As much as Elon Musk (and a lot of other people) would like it to be the case, I think we all know that's not realistic.
Time to move on to the a more sophisticated analysis. Remember that in my earlier blog post, I started with the formula for a Poisson distribution, which for a given value of , gives the probabilities of experiencing various numbers of events (fires, in that case) per hundred million miles driven.
But remember, we don't know . We know the actual number of events that happened, which is . So as I showed in my previous post, we fix to be that specific value, rename this quantity as a "likelihood" instead of a probability, and compare it for different guesses at .
In the earlier post, an "event" was a fire, and there were of them in a hundred million miles. That gave us the yellow curve in this graph:
This time, an "event" is a death or serious injury. There are of them. That gives us the green curve in this graph:
My next step in the earlier post was to divide by the maximum likelihood and take the logarithm, giving
For fires, this gives a plot you might recognize from before,
whereas for deaths and serious injuries, the maximum likelihood estimate occurs at and we get this:
Interesting — it's linear! This happens whenever the number of events observed is zero, as you can work out from the equations above:
(in case it bothers you, ).
On that last graph, you'll see that I marked two dotted vertical lines, indicating the two hypotheses I want to test. A little explanation of these hypotheses is in order, because there was a bit of work involved in coming up with them.
See, Tesla's blog post says there has never been a death or a serious injury in a Tesla. So in order to figure out whether Teslas are statistically safer than normal cars, I need to look at the rate of fatalities and serious injuries in normal cars — basically that's just the equivalent of for a gas car. The specific quantitative hypothesis we should try to reject can be stated like this:
The rate of fatalities and serious injuries in a Tesla is at least as great as the rate of fatalities and serious injuries in a normal car.
I can find from an NHTSA report (PDF) that in 2012, there were 1.14 deaths and 80 injuries per hundred million miles driven on US roads overall. Most injuries in traffic accidents are minor, though. So all that tells us is that the total rate of deaths and serious injuries was somewhere between 1.14 and 81 per hundred million miles. That's a pretty wide range.
We can start by checking the boundaries of this range. First, consider the lower boundary, corresponding to . In words it would be stated like this:
The rate of fatalities and serious injuries in a Tesla is at least as great as the rate of fatalities in a normal car.
This hypothesis is represented by the leftmost dotted line, which crosses the log likelihood curve (the green line) at , which is not really that unlikely. Remember from my earlier post that (a.k.a. ) is probably the most common threshold for statistical significance, albeit a rather weak one (you get it wrong 5% of the time), and this hypothesis doesn't even meet that threshold. So if gas-powered cars had 1.14 deaths and serious injuries per hundred million miles, we would not be able to confidently say that Teslas are safer.
On the other hand, think about the upper boundary, corresponding to . In words it would be
The rate of fatalities and serious injuries in a Tesla is at least as great as the rate of fatalities and all injuries in a normal car.
In this case, the relevant value is way off the right edge of the chart! A vertical line at that value of would cross the (green) log likelihood line at , which is huge. That's very unlikely. If that were the rate of deaths and serious injuries, we could conclude that Teslas are safer and effectively never be wrong. (The chance would be . Check your understanding, if you like, by working this out from the cumulative distribution function of the chi-square probability distribution.)
Evidently, depending on just how many of those 80 injuries per hundred million miles are "serious," we may or may not be able to reject the original hypothesis — remember, that means we may or may not be able to conclude that Teslas are safer than normal cars. So I did some digging into data provided by the National Highway Traffic Safety Administration to try to pin this number down. It turns out that every year, NHTSA investigators go out and collect a fairly large (around 60 thousand) sample of police reports about car accidents, and incorporate them all into a database called NASS, the National Automotive Sampling System. From the NASS/GES data for 2012, I was able to determine that out of the car accidents that year in which someone was injured (but not killed), 22.6% involved at least one serious injury (for what seems like a reasonable definition of serious). So it stands to reason that is a reasonable estimate at the number of traffic accidents per hundred million miles that involved a serious injury.
Adding that figure of 18.1 to the 1.14 fatalities per hundred million miles, I get the overall estimate of deaths and serious injuries per hundred million miles to be 19.3. The corresponding hypothesis is that , and that is marked with the vertical dotted line at the right of the last graph. That line intersects the green log likelihood line at , which exceeds any reasonable threshold for statistical significance — not only the common threshold, but also the (a.k.a. ) used in particle physics, and even the threshold () used in high-precision manufacturing. At this value of , we have only a one-in-two-billion chance of rejecting the hypothesis if it is in fact true.
So as far as we can tell from the statistics, we can pretty confidently state that Teslas are safer than gas-powered cars.