<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Small Gray Matters &#187; statistics</title>
	<atom:link href="http://www.smallgraymatters.com/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.smallgraymatters.com</link>
	<description>of brains and their minds</description>
	<lastBuildDate>Fri, 18 Sep 2009 01:27:48 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>why base rates matter</title>
		<link>http://www.smallgraymatters.com/2009/09/13/why-base-rates-matter/</link>
		<comments>http://www.smallgraymatters.com/2009/09/13/why-base-rates-matter/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 02:51:34 +0000</pubDate>
		<dc:creator>small and gray</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[tutorials]]></category>
		<category><![CDATA[adhd]]></category>
		<category><![CDATA[base rates]]></category>
		<category><![CDATA[cancer]]></category>
		<category><![CDATA[cell phones]]></category>
		<category><![CDATA[death]]></category>
		<category><![CDATA[driving]]></category>
		<category><![CDATA[stimulants]]></category>

		<guid isPermaLink="false">http://www.smallgraymatters.com/?p=56</guid>
		<description><![CDATA[Here are three recent scientific findings you may or may not have heard about:
1. The use of stimulant medications commonly prescribed for ADHD is associated with a nearly 8-fold increase in the likelihood of dying suddenly among children aged 7 &#8211; 19.
2. Gum disease increases the risk of head and neck cancer quite dramatically: for [...]]]></description>
			<content:encoded><![CDATA[<p>Here are three recent scientific findings you may or may not have heard about:</p>
<p>1. The use of stimulant medications commonly prescribed for ADHD is associated with <a href="http://ajp.psychiatryonline.org/cgi/content/full/166/9/992">a nearly 8-fold increase in the likelihood of dying suddenly</a> among children aged 7 &#8211; 19.</p>
<p>2. Gum disease <a href="http://cebp.aacrjournals.org/content/18/9/2406.full">increases the risk of head and neck cancer</a> quite dramatically: for every millimeter of alveolar bone loss (i.e., loss of the bone that surrounds the roots of your teeth), there is a 400% increase in the risk of cancer (note: article requires paid access).</p>
<p>3. People who talk on a cell phone while driving are <a href="http://www.vtti.vt.edu/PDF/7-22-09-VTTI-Press_Release_Cell_phones_and_Driver_Distraction.pdf">1.3 times more likely to have an accident</a> than people who drive without any distractions.</p>
<p>At a cursory glance, all three of these stories seem like pretty bad news. And they are. But one of them is actually much worse than the others. Your job is to decide which one; take a moment to think about it, then read on.</p>
<p>If you&#8217;re like most people, you probably picked either the first or the second story. After all, it&#8217;s pretty terrible to think of children dying suddenly, or of getting cancer of the head and neck. Sudden death implies death for certain, and cancer implies death with a high probability. Most of us generally don&#8217;t see death as a good thing, so we want to avoid those outcomes. Car accidents aren&#8217;t anyone&#8217;s idea of a good time, of course; but at least most car accidents aren&#8217;t fatal. And then there&#8217;s the matter of the differing odds to consider: in the first story, the negative outcome is 8 times as likely, and in the second, it&#8217;s 4 times as likely, but in the third story, it&#8217;s only 1.3 times as likely. Surely then, it&#8217;s more important to avoid taking stimulant drugs and to brush and floss regularly than to worry about talking on a cell phone!</p>
<p>Well, as you might have guessed from the fact that I started the previous paragraph with &#8220;if you&#8217;re like most people&#8230;&#8221;, the truth is actually somewhat counterintuitive. The fact of the matter is that, even if the above stories are completely true (and as far as I know, they are, pending further research), turning off your cell phone when you drive is probably a much, much better way to minimize your chance of dying early than swearing off stimulants or practicing great oral hygiene (though the latter is still important!). The reason is that the information I gave you in the three stories above neglects what&#8217;s probably the most important piece of of all to consider: the base rate (or frequency) of each event occurring.</p>
<p>Let&#8217;s add some context to each of the three stories. Take the first one. It&#8217;s true (at least based on one preliminary study) that kids who take stimulant medications are much more likely to die suddenly than kids who don&#8217;t. But the critical thing to consider is the base rate of sudden death. You probably won&#8217;t be surprised to hear that the odds of dying suddenly are <em>incredibly</em> low when you&#8217;re 7 &#8211; 19 years old. It&#8217;s unclear exactly how low they are, but consider that the study that reported this finding scoured state databases between the years of 1985 and 1996 and still only came up with 564 cases of sudden death. That&#8217;s a tiny, tiny, tiny fraction of the number of kids who make it past 19 years of age in good health. Suppose we say that the probability of sudden death for a kid in this age range is 0.0001% per year. An eightfold increase would mean that the average kid goes from a one in a million chance to just under a one in a hundred thousand chance of dying per year. And of course, it&#8217;s not average kids for whom stimulant medications are prescribed; usually, there&#8217;s a condition (e.g., ADHD) that the drugs are intended to alleviate. When you weigh the increase in the negligible likelihood of sudden death against the very <a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1868385">sizeable benefits conferred by stimulant medications</a>, it&#8217;s clear that this finding isn&#8217;t really cause for alarm. As <a href="http://psychcentral.com/blog/archives/2009/09/01/adhd-stimulants-children-and-sudden-death/">John Grohol notes</a>, &#8220;The finding is of greater interest in trying to understand why it’s occurring at all, not for anyone to make a treatment decision based upon it.&#8221;</p>
<p>What about the second story? Well, you can probably already see where this is going. Head and neck cancer is quite rare, accounting for fewer than 50,000 new cases per year in the United States. In other words, approximately one in every 6000 people will develop head and neck cancer. This of course includes both people who have good oral hygiene and people who don&#8217;t, so the reality is that, even if you have terrible oral hygiene and rampant gum disease, you&#8217;re very unlikely to ever develop head and neck cancer. Conversely, there are other factors that present even greater risk factors for head and neck cancer than gum disease (e.g., smoking). This isn&#8217;t to say that you shouldn&#8217;t brush your teeth, of course; there are plenty of other good reasons to take good care of your gums. It&#8217;s just to say that you shouldn&#8217;t lose any sleep over the prospect of developing head and neck cancer because of your gums. In the grand scheme of things, there are any number of other things you should be much more concerned about.</p>
<p>One of the things you should be much more concerned about, actually, is your risk of having a car accident while talking on your cell phone. Unlike sudden death in children and head and neck cancers in adults, the odds of dying in a car accident are not very small. Worldwide, <a href="http://en.wikipedia.org/wiki/Causes_of_death">approximately 2% of deaths</a> every year are caused by road accidents. And that&#8217;s to say nothing about serious injuries sustained in non-fatal accidents. Put simply, a 1.3-fold increase in the likelihood of enduring car accidents is not trivial. If we do a back-of-the-envelope calculation and assume that the odds of <em>dying</em> in a car accident increase by the same proportion (i.e., that drivers on cell phones don&#8217;t have more serious accidents than drivers off cellphones&#8211;which is debatable), it turns out that you can reduce your overall odds of dying in any given year by about 0.6% just by not talking on your cell phone while driving. Admittedly, that&#8217;s a very loose estimate that&#8217;s based on questionable data and many simplifying assumptions. And it&#8217;s not like it&#8217;s a dramatic reduction by any stretch (which only goes to further illustrate the importance of considering base rates). But the point is, there are probably relatively few lifestyle change you could make this year that would require so little effort for such a large benefit. So take your ADHD meds, brush your teeth regularly, and don&#8217;t talk on your cell phone while driving.</p>
<p>For a nice overview of empirical data on the base rate fallacy, see this <a href="http://www.bbsonline.org/Preprints/OldArchive/bbs.koehler.html">article in BBS</a>. For more blogospheric bloviation on base rates, see <a href="http://www.spaceandgames.com/?p=59">here</a>, <a href="http://news.bbc.co.uk/2/hi/uk_news/magazine/8153539.stm">here</a>, and <a href="http://michaelgr.com/2007/11/24/cognitive-bias-base-rate-fallacy/">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.smallgraymatters.com/2009/09/13/why-base-rates-matter/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A primer on power</title>
		<link>http://www.smallgraymatters.com/2006/12/04/a-primer-on-power/</link>
		<comments>http://www.smallgraymatters.com/2006/12/04/a-primer-on-power/#comments</comments>
		<pubDate>Tue, 05 Dec 2006 05:06:23 +0000</pubDate>
		<dc:creator>small and gray</dc:creator>
				<category><![CDATA[methodology]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://www.smallgraymatters.com/2006/12/04/a-primer-on-power/</guid>
		<description><![CDATA[I&#8217;d like to title this post “a power primer,” but that’s the title of a 1992 Psychological Bulletin article by Jacob Cohen (the god of power analysis, now deceased). So instead I’ve titled it “a primer on power.” By changing a few words around I’ve very cleverly gone from academic plagiarism to paying homage. (And [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;d like to title this post “a power primer,” but that’s the title of <a href="http://www.education.wisc.edu/elpa/academics/syllabi/2006/06Spring/825Borman/Cohen1992.pdf">a 1992 Psychological Bulletin article by Jacob Cohen</a> (the god of power analysis, now deceased). So instead I’ve titled it “a primer on power.” By changing a few words around I’ve very cleverly gone from academic plagiarism to paying homage. (And it really is one: I think Cohen’s article, and his lengthier works on power, should be required reading for behavioral scientists of all stripes).</p>
<p>Power is one of the most misunderstood and/or underappreciated concepts in scientific research. Simply put, it refers to the probability of detecting an effect in your sample when it is in fact present in the population (i.e., when it’s ‘real’). If your study has, say, 90% power to detect a difference in the length of socks worn by basketball players as compared to soccer players, that means that <em>if there really is a difference</em> between basketball and soccer players’ sock length, there’s a 9 in 10 chance on average that you’ll be able to detect it in your sample.</p>
<p>In general, power is a good thing, and you want to have as much of it as you can. In an ideal world, scientific experiments would have 100% power to detect effects. Unfortunately, that doesn’t happen in the real world, because to have 100% power (i.e., complete certainty), you’d need to sample the entire population of interest, which isn’t very practical (that’s a lot of players, and twice as many socks). In practice, researchers’ sample sizes are constrained by resource considerations. And so, as a result, is power. Any time you conduct an experiment with a finite sample, you’re taking the risk that you might miss an effect even if it really does exist, simply because of blind (mis)fortune. And in general, the smaller your sample, the greater the probability of you missing an effect. This idea is intuitive enough to most people: it seems pretty obvious that if you want to know whether men are taller than women, you don’t want to base your judgment on the difference in height between just one man and one woman. If you did, you&#8217;d run the risk that you just happened to pick a particularly short man and/or a particularly tall woman. The more men and women you measure, the more the random variations from the mean average out, and the smaller the odds of mistakenly concluding that there’s no gender difference in height.</p>
<p>Where confusion starts to set in (and the impetus for this post) is that the intimate link between sample size and power often leads people (including many scientists) to suppose that there’s a single ‘right’ sample size for all research studies of a particular kind. It’s not uncommon to hear people say things like, “we can&#8217;t trust that study because it&#8217;s based on only 50 people! They need at least 300 to be able to say anything meaningful about the general population!” (Actually this sort of statement also betrays another kind of confusion that relates to the difference between Type I and Type II errors, but that’s a separate issue). The problem is that statistical power depends not only on sample size, but also on two other numbers: the size of the effect, and the stipulated false positive rate (also referred to as alpha, or the Type I error rate).</p>
<p>The importance of the first of these—effect size—is easy to see intuitively. Suppose that the average height difference between men and women was 2 feet rather than several inches. How hard would it be to detect that difference and conclude it exists? Not very. A group of curious alien taxonomists wouldn’t need to abduct very many humans before they figured the gender difference out, simply because the vast majority of men would be taller than the vast majority of women, and the difference would hit the aliens right between the antennae. On the other hand, if the mean height difference was only 1/10th of an inch, our aliens would need to abduct a lot of humans and measure them very carefully before they’d be in a good position to claim that a height difference exists. Simply put, if the effect you’re looking for is large, it takes fewer subjects in order to detect it. Or, more formally, one’s power to detect an effect increases in proportion to the magnitude of the effect, when holding sample size constant.</p>
<p>The second parameter, false positive rate, is somewhat less intuitive. The basic idea is that, since sampling is random and error necessarily creeps in, on rare occasions, researchers are going to end up concluding that an effect exists in the population even though it doesn’t really. Just how often such errors occur is typically a matter of stipulation: scientists will decide that they can accept a false positive occurring, say, 1 out of every 20 times, and adjust their statistical tests accordingly. Conventionally, the false positive rate is set to 5% (and significance tests are therefore conducted at p < .05). Because the convention is so strong, it’s often easy to overlook the false positive rate in power calculations and just default to the standard 5% level. Nonetheless, there is a relationship: the more conservative your statistical test (i.e., the smaller the false positive rate you're willing to accept is), the lower your power gets. In less technical terms, it's kind of like saying that if you only want to be <em>fairly</em> sure that an effect holds true, you don&#8217;t need to look very hard. But if you want to be <em>really</em> sure, you need to double and triple-check to make sure. And double and triple-checking requires more observations (i.e., more subjects.)</p>
<p>Given that power depends only on these two parameters (sample size and false positive rate), how much power is enough? It’s widely accepted that a reasonable level of power is 80-85%. I say ‘widely accepted’ because when people stop to think about what level of power they find acceptable, their answer tends to be in that ballpark (i.e., 4 times out of 5, your experiment will detect the effect you want if it really exists). But that’s not to say that most studies actually <em>have</em> that level of power in practice. One of the most remarkable findings (and one that’s been demonstrated over and over again) made by statisticians interested in power is that an absurdly large proportion of studies in many disciplines simply don’t have the necessary power to detect the effects they hypothesize. In the article I linked to at the beginning, Jacob Cohen points out that an analysis he conducted in 1960 indicated that the average social psychology study had only 48% power to detect moderate-sized effects. In Cohen’s words, “the chance of obtaining a significant result was about that of tossing a head with a fair coin” (p. 155). And that’s on<em> average</em>; presumably there are a good number of studies that have set out to identify effects they have <em>no real chance of detecting even if they’re actually present in the population</em>.</p>
<p>Cohen then went on to note that other statisticians conducting similar reviews have shown no improvement in the average level of power in the decades since. For anyone actively involved in research—or even to casual consumers of science—this should raise red flags all over the place. There really is no excuse for failing to do a simple power calculation before beginning to collect data. It’s not as though power calculation is a tedious process: all you have to do is plug two or three numbers into an online worksheet, and poof, you get your answer instantly. And yet many, maybe even most, scientists fail to do so.</p>
<p>In fairness, doing a power calculation isn’t quite <em>that</em> easy, because you rarely know the exact size of the effect you’re seeking. If you did, you probably wouldn’t need to do the study in the first place! While it’s easy to decide you’d like your study to have, say, 80% power, it’s not so easy to come up with a reasonable estimate of effect size.</p>
<p>Suppose for example that we want to know if there’s a correlation between people’s mood and the amount of television they watch daily. Let’s stipulate our power has to be around 80% (we don’t want to do our study if we don’t think there’s at least a 4 in 5 chance of detecting an effect), and we’ll test our hypothesis at the conventional level of p < .05. How many subjects do we need to collect data from? Well, depends. If the correlation between mood and television watching in the general population is <em>large</em> (canonically, around r = .5), we’re only going to need 29 people to have an 80% chance of detecting it. If it’s <em>medium</em> (say, r = .3), we’re going to have to round up 85 people. But if it’s only a <em>small</em> effect (say, r = .1, or an overlap of only 1% of the total variance in each measure), we’re faced with the daunting prospect of chasing down 785 subjects! Note that in all 3 of these cases, we’re assuming that there <em>really is a correlation between mood and television-watching</em>. The only difference is how strong that effect is.</p>
<p>Of course, power calculations don’t always have to mean bad news. For example, in my area of research (functional neuroimaging), power calculations are often quite comforting. It’s an interesting quirk that people often criticize imaging studies for having small samples, when in fact imaging studies probably don’t have lower power on average than other kinds of studies (at least for standard experimental, within-subject analyses). The knee-jerk reaction is understandable though, because many psychologists (particularly in social or personality psychology) are used to working with samples in the hundreds. If that’s your background, it’s no surprise that when you come across neuroimaging studies that used samples of only 15 subjects (a pretty standard size), you’re going to think something’s horribly wrong.</p>
<p>In fact, there’s nothing wrong, because it turns out (fortuitously!) that effect sizes in functional neuroimaging studies tend to be huge. It’s not uncommon to see effect sizes around d = 2 (d is a standardized measure of effect size popularized by Cohen; it’s measured in standard deviations, so a d of 2 means the difference in neural activation between two experimental conditions is around 2 standard deviations). Effects that large are unheard of in most other disciplines. Consider that Cohen himself considered anything above d = 0.8 a ‘large’ effect (this is just a heuristic of course—the meaning of ‘large’ differs considerably across research areas!).</p>
<p>A quick power calculation reveals that a study with 12 subjects has essentially 100% power to detect an effect size of 2 at p < .05. Basically, if the population effect really is that big, you’re not going to miss it. In fact, with only 2 subjects, you’d still have an 88% shot of detecting it. This explains why early neuroimaging studies that often had only 3 or 4 subjects were able to obtain replicable results. In the early days, when little was known about the relationship between specific cognitive tasks and neural activity in humans, researchers used very broad experimental task contrasts specifically intended to elicit very large, very obvious changes in activation (e.g., comparing activation during a working memory task to a passive resting state). The effects were (not surprisingly, in hindsight) enormous. As time goes on and our knowledge of the functional neuroanatomy of cognition builds up, hypotheses become more subtle, and effect sizes diminish, requiring larger samples.</p>
<p>Of course, imaging studies usually don’t test for effects at p < .05, for reasons I won’t go into here (mainly the need to correct for multiple comparisons). Still, even at p < .001, a study with 15 subjects has 70% power. That’s not great, but it’s a comparable level to what you’ll find in many behavioral studies. Bump the sample up to 20 subjects, and power is now 92%, which is more than acceptable.</p>
<p>Hopefully, these example make clear the importance of (a) conducting power calculations <em>before</em> starting to collect data, and (b) having some reasonable notion as to what the population effect size might be (e.g., based on related effects that have already been identified). Even if you&#8217;re never going to collect any data yourself, and just want to be an informed consumer of scientific literature, it pays to know something about power. Remember: effect size matters. The fact that a study only has 10 people doesn&#8217;t necessarily mean it&#8217;s too small to provide meaningful data. Conversely, a study can have thousands of subjects and still be underpowered.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.smallgraymatters.com/2006/12/04/a-primer-on-power/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
