<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Small Gray Matters &#187; methodology</title>
	<atom:link href="http://www.smallgraymatters.com/category/methodology/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.smallgraymatters.com</link>
	<description>of brains and their minds</description>
	<lastBuildDate>Fri, 18 Sep 2009 01:27:48 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>&#8220;The brain has hubs?&#8221;</title>
		<link>http://www.smallgraymatters.com/2008/07/01/the-brain-has-hubs/</link>
		<comments>http://www.smallgraymatters.com/2008/07/01/the-brain-has-hubs/#comments</comments>
		<pubDate>Wed, 02 Jul 2008 06:25:19 +0000</pubDate>
		<dc:creator>small and gray</dc:creator>
				<category><![CDATA[methodology]]></category>
		<category><![CDATA[neuroimaging]]></category>
		<category><![CDATA[research articles]]></category>
		<category><![CDATA[connectivity]]></category>
		<category><![CDATA[default network]]></category>
		<category><![CDATA[mri]]></category>
		<category><![CDATA[network structure]]></category>

		<guid isPermaLink="false">http://www.smallgraymatters.com/?p=30</guid>
		<description><![CDATA[If you read only one neuroimaging paper this week, make it this paper in PLoS Biology by Hagmann and colleagues. It&#8217;s a really remarkable combination of technical wizardry, creativity, and pretty, pretty pictures of the brain. What Hagmann et al have done is assemble rock-solid evidence that a network of brain regions located primarily in [...]]]></description>
			<content:encoded><![CDATA[<p>If you read only one neuroimaging paper this week, make it <a href="http://biology.plosjournals.org/perlserv/?request=get-document&amp;doi=10.1371/journal.pbio.0060159">this paper in PLoS Biology</a> by Hagmann and colleagues. It&#8217;s a really remarkable combination of technical wizardry, creativity, and pretty, pretty pictures of the brain. What Hagmann et al have done is assemble rock-solid evidence that a network of brain regions located primarily in posterior midline cortex serves as the structural &#8216;core&#8217; of the broader cortical connectivity map. Whereas most brain regions show sparse connectivity, typically talking to only a handful of other nearby regions , regions in the structural core are much more densely connected with one another and with other regions throughout the cortex. Hagmann et al. support this basic conclusion with five or six different analyses, each using a different network topology metric (herein lies the technical wizardry), but the bottom line is that they obtain much the same result no matter how they looked at the data.</p>
<p>What&#8217;s really striking about this study is that it&#8217;s arguably the best example to date (or at least, the best example that I know of&#8211;I don&#8217;t follow this literature closely) of the power that new structural MRI techniques provide to assess in vivo brain connectivity in humans. In this case, the authors used diffusion spectrum imaging, a technique that lets the researcher construct whole-brain images of white matter fiber density and then (using some sophisticated post-processing) plot the trajectories of those tracts. The authors defined a connection between regions as the presence of at least one fiber with end-points in both regions (the more terminating fibers, the stronger the connection). Given an N x N matrix (where N = 998 different brain regions in this case!) of connectivity strengths between regions, they could then apply the suite of network topology metrics to produce those <a href="http://biology.plosjournals.org/perlserv/?request=slideshow&amp;type=figure&amp;doi=10.1371/journal.pbio.0060159&amp;id=99755">pretty</a>, <a href="http://biology.plosjournals.org/perlserv/?request=slideshow&amp;type=figure&amp;doi=10.1371/journal.pbio.0060159&amp;id=99759">pretty</a> figures.</p>
<p>Lest you think this all sounds like black magic (as I suspect a reviewer or two did), Hagmann et al. provide evidence that these structure-based connectivity maps (a) are reliable across hemispheres and scanning sessions; (b) degrade gracefully in the presence of noise; (c) conform nicely to connectivity data obtained from more conventional anatomical tract tracing techniques in monkeys; and (d) are quantitatively very similar to maps obtained using functional resting-state data in the same participants.  The sheer breadth of analysis in this paper is really quite striking, and you&#8217;d have to nit-pick to find faults with the methodology.</p>
<p>That said, there&#8217;s one critical question that these results don&#8217;t really address, and that&#8217;s what the findings <em>mean</em> from a functional standpoint. it&#8217;s easy to make the general argument that a <a href="http://en.wikipedia.org/wiki/Small_world_network">small-world network structure</a> is A Good Thing &#8482; for the brain to have; but the (arguably) more interesting question is why the hubs are located in <em>these</em> particular brain regions. The fact that a majority of the hubs (including posterior cingulate, precuneus, lateral parietal cortex, and superior temporal sulcus) are components of the brain&#8217;s &#8220;default&#8221; or <a href="http://www.pnas.org/cgi/content/abstract/102/27/9673">task-negative</a> network is clearly no coincidence. So what functional purpose does this pattern of connectivity serve? Why do those regions that are maximally activated at rest have the broadest pattern of connectivity with the rest of the cortex? Or is it perhaps the other way around, so that these regions develop their default status precisely because they receive inputs from multiple sources, and are ideally situated to mediate transitions between different task sets? Clearly, many questions remain to be addressed (warning: a horribly cliched ending to this post is imminent), but the Hagmann et al. paper will probably turn out to be a pretty important piece of the puzzle (see, I warned you).</p>
<p>Hat-tip: <a href="http://scienceblogs.com/neurophilosophy/2008/07/hi_res_brain_topology_map.php">Neurophilosophy</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.smallgraymatters.com/2008/07/01/the-brain-has-hubs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Two cautionary notes on the use of fMRI</title>
		<link>http://www.smallgraymatters.com/2008/06/17/two-cautionary-notes-on-the-use-of-fmri/</link>
		<comments>http://www.smallgraymatters.com/2008/06/17/two-cautionary-notes-on-the-use-of-fmri/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 07:04:20 +0000</pubDate>
		<dc:creator>small and gray</dc:creator>
				<category><![CDATA[fmri]]></category>
		<category><![CDATA[methodology]]></category>
		<category><![CDATA[neuroimaging]]></category>
		<category><![CDATA[news articles]]></category>
		<category><![CDATA[criticism]]></category>

		<guid isPermaLink="false">http://www.smallgraymatters.com/?p=28</guid>
		<description><![CDATA[This week&#8217;s issues of Science and Nature each have very nice commentaries on the limitations of fMRI, a topic I&#8217;ve written  about a few times before. The Nature piece is a review by Nikos Logothetis entitled &#8220;What we can  do and what we cannot do with fMRI&#8220;. Logothetis is uniquely placed to comment on these matters; a very large chunk of what we know about the BOLD signal (the primary [...]]]></description>
			<content:encoded><![CDATA[<p>This week&#8217;s issues of Science and Nature each have very nice commentaries on the limitations of fMRI, a topic I&#8217;ve<a href="http://www.smallgraymatters.com/2006/06/30/how-much-should-scientists-worry/"> written  about</a> <a href="http://www.smallgraymatters.com/2006/06/27/in-unnecessary-defense-of-neuroimaging-a-comment-on-paul-bloom/">a few</a> <a href="http://www.smallgraymatters.com/2006/06/28/neurons-blood-flow-and-their-intimate-relationship/">times</a> <a href="http://www.smallgraymatters.com/2006/07/09/more-on-fmri/">before</a>. The Nature piece is a review by Nikos Logothetis entitled &#8220;<a href="http://www.nature.com/nature/journal/v453/n7197/full/nature06976.html">What we can  do and what we cannot do with fMRI</a>&#8220;. Logothetis is uniquely placed to comment on these matters; a very large chunk of what we know about the BOLD signal (the primary vehicle of fMRI studies) is due to <a href="http://www.smallgraymatters.com/2006/06/28/neurons-blood-flow-and-their-intimate-relationship/">his seminal work</a>. While the review is pretty expansive (particularly for Nature, at 10 pages!) and somewhat technical, the take-home message is that the most serious limitations of fMRI are due to massive aggregation over  distinct populations of neurons rather than to any technical limitations per se. Or, as he puts it much more eloquently:</p>
<blockquote><p>The limitations of fMRI are not related to physics or poor engineering, and are unlikely to be resolved by increasing the sophistication and power of the scanners; they are instead due to the circuitry and functional organization of the brain, as well as to inappropriate experimental protocols that ignore this organization.</p></blockquote>
<p>That&#8217;s not to say that all is lost, of course. On the whole, Logothetis is pretty optimistic about the value of fMRI, even going so far as to suggest that &#8220;MRI is currently the best tool we have for gaining insights into brain function and formulating interesting and eventually testable hypotheses&#8221;; it&#8217;s just that it&#8217;s not perfect by a long shot.  But anyway, there&#8217;s much more to the review than I can convey coherently in my current sleepy state, so if you have access to Nature, <a href="http://www.nature.com/nature/journal/v453/n7197/full/nature06976.html">it&#8217;s definitely worth reading</a>.</p>
<p>The  Science piece (<a href="http://www.sciencemag.org/cgi/content/full/320/5882/1412">&#8220;Growing Pains for fMRI&#8221;</a>) is a much lighter news article by Greg Miller, and it focuses mostly on a controversy that played out in the pages of the New York Times last year. The thumbnail sketch is   that  one group of fMRI researchers did some very shoddy &#8220;research&#8221; on the way people view the different election candidates, and another (larger) group of researchers called them on it.  The exchange then led to  a period of widespread soul-searching amongst cognitive neuroscientists, until ultimately, in March 2008, the Cognitive Neuroscience Society imposed a moratorium on publication of all fMRI data until  a common set of guidelines for rigorous and ethical research conduct was agreed upon.  Ok, that last part is completely made up. But the point is that the article is a good read, and you should check it out if you can.  It&#8217;s not often you hear   one scientist  say that another scientist&#8217;s study  was &#8220;really closer to astrology than it was to real science&#8221; (for the record, I agree with that assessment in this case).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.smallgraymatters.com/2008/06/17/two-cautionary-notes-on-the-use-of-fmri/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>trendspotting the fMRI literature</title>
		<link>http://www.smallgraymatters.com/2007/01/08/trendspotting-the-fmri-literature/</link>
		<comments>http://www.smallgraymatters.com/2007/01/08/trendspotting-the-fmri-literature/#comments</comments>
		<pubDate>Tue, 09 Jan 2007 06:43:58 +0000</pubDate>
		<dc:creator>small and gray</dc:creator>
				<category><![CDATA[academics]]></category>
		<category><![CDATA[fmri]]></category>
		<category><![CDATA[methodology]]></category>

		<guid isPermaLink="false">http://www.smallgraymatters.com/2007/01/08/trendspotting-the-fmri-literature/</guid>
		<description><![CDATA[Select a few neuroimaging papers at random and you’re likely to come across a handful of statements in the introduction to the effect that the topic under study is of “increasing interest”. At conferences and research talks, you’ll sometimes see speakers invoke a familiar kind of figure that looks something like this:

That’s the number of [...]]]></description>
			<content:encoded><![CDATA[<p>Select a few neuroimaging papers at random and you’re likely to come across a handful of statements in the introduction to the effect that the topic under study is of “increasing interest”. At conferences and research talks, you’ll sometimes see speakers invoke a familiar kind of figure that looks something like this:</p>
<p><img title="Number of 'language and fmri' citations in PubMed, 1996-2006" alt="Number of 'language and fmri' citations in PubMed, 1996-2006" src="http://www.smallgraymatters.com/images/language_1.jpg" /></p>
<p>That’s the number of citations in PubMed containing the terms ‘fMRI’ and ‘language’ in the abstract or title, plotted by year of publication. Figures like this purport to show that interest in a topic is increasing dramatically. Just look at that increase! In 1996, there were only 13 hits; by 2005, there were 99! It’s as clear as daylight that interest in the neural bases of language is increasing!</p>
<p>Of course, the poorly-kept secret is that fMRI didn’t exist twenty years ago, and wasn’t really widely adopted until the last few years. So it’s natural to see an increase in publications that study language using neuroimaging methods. You’d expect a similar increase for almost<em> every</em> other area of research. The more pertinent question is whether interest in a particular topic has increased <em>disproportionately</em> relative to the general increase in the use of fMRI over the last few years. Instead of plotting absolute numbers, what we want is something like this:</p>
<p><img src="http://www.smallgraymatters.com/images/language_2.jpg" /></p>
<p>In the above figure, the pink line represents the number of papers with the terms ‘fMRI’ and ‘language’ in the title (the blue line in the first figure has now turned pink&#8211;sorry about the color confusion!). But now the additional (blue) line shows the number of papers that have just  the term ‘fMRI’ in the abstract. The increase in language papers starts to look suspect, since it&#8217;s clear the increase in fMRI papers on language is essentially paralleled by the increase in fMRI papers in general. Here’s an even better representation:</p>
<p><img src="http://www.smallgraymatters.com/images/language_3.jpg" /></p>
<p>That’s the proportion of PubMed studies with the terms ‘fMRI’ and ‘language’ in the title or abstract over the last few years relative to the total number of studies with just the term “fMRI”. As you can see, it’s a very different picture. It’s a small sample size, but there’s not much reason to think people are any more interested in studying language in 2006 than in 1998—at least, <em>relative to interest in other topics that can be studied with fMRI.</em></p>
<p>So what to make of claims that research interest is increasing in topics X, Y, and Z? Well, in a sense those claims are true, since the total number of neuroimaging publications continues to rise fairly dramatically. But in the sense that researchers probably care about more—namely, the “if I have a magnet and I want to do a study, what’s a hot topic right now?” sense—most research topics <em>can’t</em> be on the rise, by definition (just like most people can’t be of above average intelligence). Moreover, the number of academic publications <em>in general</em> has increased pretty dramatically over the last few years, so it’s not even clear from the above just how much of the increase in the number of fMRI papers on language is due to greater adoption of fMRI as opposed to a more global increase in scientific research output.</p>
<p>Now, the point of this post isn’t just to malign a ubiquitous research tactic. One can’t really fault people for wanting to think their own research is more interesting than other people’s. I’ll be the first to confess I’ve inserted some rather disingenuous comments about how oh-so-fascinating my results are and how much they (should) mean to other researchers in my papers. It’s hard to motivate a paper without doing that to some degree, or even to get motivated to do the research in the first place. What the second graph above does point up though, is that the question as to what topics are ‘hot’ is an empirical one—and fortunately, one that can be relatively easily (though imprecisely) tested.</p>
<p>To generate the above graphs, I used data from PubMed. One of the many nice things about PubMed is that it has <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html">an API</a> that allows you to access the database programmatically (in contrast to Google Scholar, which is inaccessible via API due to agreements between Google and the major publishers to keep it that way). So, in the interest of doing some trendspotting, I wrote a small Visual Basic program to quantify the emergence (or lack thereof) of real ‘trends’ in research. I used the search string “fMRI [tiab]” as the control—i.e., all articles containing the string “fMRI” in the title or abstract. This is a conservative approach since the standard PubMed search also searches article contents, resulting in a difference of an order of magnitude in hits (7000 vs. 160000). But the more conservative approach is likely more accurate, since any study that includes the term in its title or abstract is much more likely to report original fMRI data than studies that just mention the terms in passing.</p>
<p>This reference number (broken down by year) was then compared with the results of a series of more specific searches. Basically, for a variety of topics, I added a single search term like “language” or “emotion” to the basic search. Again, the stipulation was that only titles and abstracts be searched. The ratio between the specific and the general term was then plotted for each year in order to highlight potential trends.</p>
<p>What do the results look like? Here are the ‘trends’ in neuroimaging for four major areas of research, broken down for the years 1996-2006:</p>
<p><img src="http://www.smallgraymatters.com/images/domains_1.jpg" /></p>
<p>What can we infer from the above figure? Well, just by eyeballing it, it looks like there’s a general trend toward relative increases in the number of papers on emotion, working memory, and attention, and no change for language. Statistical tests reveal that the three positive trends are significant (p < .05 for all three). So there’s at least some evidence that there are in fact trends in neuroimaging research (assuming there isn’t some alternative explanation, e.g., abstracts just getting longer and consequently mentioning more terms). The key point is that this kind of information can’t be gleaned just by looking at the first figure presented in this post. Absolute increases in publication count aren’t particularly informative. In contrast, when you use a control condition—though in this case, an admittedly crude one—you can feel a little more confident about the conclusions you’re able to draw. Naturally, this is a small sample size, and as I mentioned, the search is highly conservative (obviously, more than 46 fMRI articles on emotion were published in 2006!). But it’s likely that the results are a good representation of what’s out there, and that we can safely generalize to the many papers that use fMRI to study these topics but didn’t use the exact term in the abstract.</p>
<p>What about other ways of carving up the literature? Here’s the breakdown by sensory modality:<br />
<img src="http://www.smallgraymatters.com/images/domains_2.jpg" /></p>
<p>Doesn’t look like much is going on, and indeed none of the regression slopes are statistically significant. But at least this analysis is somewhat reassuring given the increases seen above for working memory, attention, and emotion: it’s clearly not as though <em>all</em> search terms are being mentioned more frequently in more recent fMRI abstracts.</p>
<p>Here’s one last figure (this could obviously go on for a very long time) plotting the trajectory of publication count in a few less-studied domains:</p>
<p><img src="http://www.smallgraymatters.com/images/domains_3.jpg" /></p>
<p>The trends for ‘social’, ‘reward’, and ‘decision making’ are significant here, but the trendline for pain isn’t. Social neuroscience research in particular appears to be emerging as a prominent domain of fMRI research, more than doubling its relative share of the literature between 2005 and 2006, though it’s still a relatively small field.</p>
<p>In evaluating the figures above, there are several caveats to keep in mind. One major limitation of this trendspotting approach is that it’s not well-suited to quantifying trends in more fine-grained areas of research, because there may only be a handful of studies per year, resulting in a pretty unreliable measure. Then again, claims that one small niche of research within the broader field of cognitive neuroscience is on the rise probably aren’t that interesting to begin with. If a particular topic was studied by 2 people in 2000 and 6 in 2005 (instead of a projection of, say, 4), you might want to wait a while before hopping on the bandwagon.</p>
<p>Another obvious limitation is that the procedure I used to generate these graphs was extremely simplistic. One can easily imagine more sophisticated approaches that control much more tightly for potential confounds (e.g.,  tier of journal, mean abstract length, etc.) and use better quantitative measures than the simple ratio I used above. That’s ok though; the point I want to make isn’t that this particular set of graphs provides a particularly accurate insight into the state of the field of neuromaging. Rather, the point is that scientific trends can be studied empirically just like anything else, and there’s a massive amount of data freely available for mining. Entire journals are devoted to tracking and discussing current research fads (see the <a href="http://www.trends.com">‘Trends in…’ series</a>), but it’s unclear whether the editors at such outlets make their decisions on the basis of quantitative information. Conversely, from an author’s perspective, knowing what’s hot isn’t just a matter of curiosity—careful attention to trends could conceivably increase the rate of acceptance of one’s publications.</p>
<p>As a side note, if anyone wants to suggest possible searches for trends they’d like to see quantified, feel free to leave a comment below or to email me. I may release the VB program at some point, but it’s in no shape to see the light of day at the moment. Of course, you can always head over to PubMed and enter search terms manually.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.smallgraymatters.com/2007/01/08/trendspotting-the-fmri-literature/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A primer on power</title>
		<link>http://www.smallgraymatters.com/2006/12/04/a-primer-on-power/</link>
		<comments>http://www.smallgraymatters.com/2006/12/04/a-primer-on-power/#comments</comments>
		<pubDate>Tue, 05 Dec 2006 05:06:23 +0000</pubDate>
		<dc:creator>small and gray</dc:creator>
				<category><![CDATA[methodology]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://www.smallgraymatters.com/2006/12/04/a-primer-on-power/</guid>
		<description><![CDATA[I&#8217;d like to title this post “a power primer,” but that’s the title of a 1992 Psychological Bulletin article by Jacob Cohen (the god of power analysis, now deceased). So instead I’ve titled it “a primer on power.” By changing a few words around I’ve very cleverly gone from academic plagiarism to paying homage. (And [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;d like to title this post “a power primer,” but that’s the title of <a href="http://www.education.wisc.edu/elpa/academics/syllabi/2006/06Spring/825Borman/Cohen1992.pdf">a 1992 Psychological Bulletin article by Jacob Cohen</a> (the god of power analysis, now deceased). So instead I’ve titled it “a primer on power.” By changing a few words around I’ve very cleverly gone from academic plagiarism to paying homage. (And it really is one: I think Cohen’s article, and his lengthier works on power, should be required reading for behavioral scientists of all stripes).</p>
<p>Power is one of the most misunderstood and/or underappreciated concepts in scientific research. Simply put, it refers to the probability of detecting an effect in your sample when it is in fact present in the population (i.e., when it’s ‘real’). If your study has, say, 90% power to detect a difference in the length of socks worn by basketball players as compared to soccer players, that means that <em>if there really is a difference</em> between basketball and soccer players’ sock length, there’s a 9 in 10 chance on average that you’ll be able to detect it in your sample.</p>
<p>In general, power is a good thing, and you want to have as much of it as you can. In an ideal world, scientific experiments would have 100% power to detect effects. Unfortunately, that doesn’t happen in the real world, because to have 100% power (i.e., complete certainty), you’d need to sample the entire population of interest, which isn’t very practical (that’s a lot of players, and twice as many socks). In practice, researchers’ sample sizes are constrained by resource considerations. And so, as a result, is power. Any time you conduct an experiment with a finite sample, you’re taking the risk that you might miss an effect even if it really does exist, simply because of blind (mis)fortune. And in general, the smaller your sample, the greater the probability of you missing an effect. This idea is intuitive enough to most people: it seems pretty obvious that if you want to know whether men are taller than women, you don’t want to base your judgment on the difference in height between just one man and one woman. If you did, you&#8217;d run the risk that you just happened to pick a particularly short man and/or a particularly tall woman. The more men and women you measure, the more the random variations from the mean average out, and the smaller the odds of mistakenly concluding that there’s no gender difference in height.</p>
<p>Where confusion starts to set in (and the impetus for this post) is that the intimate link between sample size and power often leads people (including many scientists) to suppose that there’s a single ‘right’ sample size for all research studies of a particular kind. It’s not uncommon to hear people say things like, “we can&#8217;t trust that study because it&#8217;s based on only 50 people! They need at least 300 to be able to say anything meaningful about the general population!” (Actually this sort of statement also betrays another kind of confusion that relates to the difference between Type I and Type II errors, but that’s a separate issue). The problem is that statistical power depends not only on sample size, but also on two other numbers: the size of the effect, and the stipulated false positive rate (also referred to as alpha, or the Type I error rate).</p>
<p>The importance of the first of these—effect size—is easy to see intuitively. Suppose that the average height difference between men and women was 2 feet rather than several inches. How hard would it be to detect that difference and conclude it exists? Not very. A group of curious alien taxonomists wouldn’t need to abduct very many humans before they figured the gender difference out, simply because the vast majority of men would be taller than the vast majority of women, and the difference would hit the aliens right between the antennae. On the other hand, if the mean height difference was only 1/10th of an inch, our aliens would need to abduct a lot of humans and measure them very carefully before they’d be in a good position to claim that a height difference exists. Simply put, if the effect you’re looking for is large, it takes fewer subjects in order to detect it. Or, more formally, one’s power to detect an effect increases in proportion to the magnitude of the effect, when holding sample size constant.</p>
<p>The second parameter, false positive rate, is somewhat less intuitive. The basic idea is that, since sampling is random and error necessarily creeps in, on rare occasions, researchers are going to end up concluding that an effect exists in the population even though it doesn’t really. Just how often such errors occur is typically a matter of stipulation: scientists will decide that they can accept a false positive occurring, say, 1 out of every 20 times, and adjust their statistical tests accordingly. Conventionally, the false positive rate is set to 5% (and significance tests are therefore conducted at p < .05). Because the convention is so strong, it’s often easy to overlook the false positive rate in power calculations and just default to the standard 5% level. Nonetheless, there is a relationship: the more conservative your statistical test (i.e., the smaller the false positive rate you're willing to accept is), the lower your power gets. In less technical terms, it's kind of like saying that if you only want to be <em>fairly</em> sure that an effect holds true, you don&#8217;t need to look very hard. But if you want to be <em>really</em> sure, you need to double and triple-check to make sure. And double and triple-checking requires more observations (i.e., more subjects.)</p>
<p>Given that power depends only on these two parameters (sample size and false positive rate), how much power is enough? It’s widely accepted that a reasonable level of power is 80-85%. I say ‘widely accepted’ because when people stop to think about what level of power they find acceptable, their answer tends to be in that ballpark (i.e., 4 times out of 5, your experiment will detect the effect you want if it really exists). But that’s not to say that most studies actually <em>have</em> that level of power in practice. One of the most remarkable findings (and one that’s been demonstrated over and over again) made by statisticians interested in power is that an absurdly large proportion of studies in many disciplines simply don’t have the necessary power to detect the effects they hypothesize. In the article I linked to at the beginning, Jacob Cohen points out that an analysis he conducted in 1960 indicated that the average social psychology study had only 48% power to detect moderate-sized effects. In Cohen’s words, “the chance of obtaining a significant result was about that of tossing a head with a fair coin” (p. 155). And that’s on<em> average</em>; presumably there are a good number of studies that have set out to identify effects they have <em>no real chance of detecting even if they’re actually present in the population</em>.</p>
<p>Cohen then went on to note that other statisticians conducting similar reviews have shown no improvement in the average level of power in the decades since. For anyone actively involved in research—or even to casual consumers of science—this should raise red flags all over the place. There really is no excuse for failing to do a simple power calculation before beginning to collect data. It’s not as though power calculation is a tedious process: all you have to do is plug two or three numbers into an online worksheet, and poof, you get your answer instantly. And yet many, maybe even most, scientists fail to do so.</p>
<p>In fairness, doing a power calculation isn’t quite <em>that</em> easy, because you rarely know the exact size of the effect you’re seeking. If you did, you probably wouldn’t need to do the study in the first place! While it’s easy to decide you’d like your study to have, say, 80% power, it’s not so easy to come up with a reasonable estimate of effect size.</p>
<p>Suppose for example that we want to know if there’s a correlation between people’s mood and the amount of television they watch daily. Let’s stipulate our power has to be around 80% (we don’t want to do our study if we don’t think there’s at least a 4 in 5 chance of detecting an effect), and we’ll test our hypothesis at the conventional level of p < .05. How many subjects do we need to collect data from? Well, depends. If the correlation between mood and television watching in the general population is <em>large</em> (canonically, around r = .5), we’re only going to need 29 people to have an 80% chance of detecting it. If it’s <em>medium</em> (say, r = .3), we’re going to have to round up 85 people. But if it’s only a <em>small</em> effect (say, r = .1, or an overlap of only 1% of the total variance in each measure), we’re faced with the daunting prospect of chasing down 785 subjects! Note that in all 3 of these cases, we’re assuming that there <em>really is a correlation between mood and television-watching</em>. The only difference is how strong that effect is.</p>
<p>Of course, power calculations don’t always have to mean bad news. For example, in my area of research (functional neuroimaging), power calculations are often quite comforting. It’s an interesting quirk that people often criticize imaging studies for having small samples, when in fact imaging studies probably don’t have lower power on average than other kinds of studies (at least for standard experimental, within-subject analyses). The knee-jerk reaction is understandable though, because many psychologists (particularly in social or personality psychology) are used to working with samples in the hundreds. If that’s your background, it’s no surprise that when you come across neuroimaging studies that used samples of only 15 subjects (a pretty standard size), you’re going to think something’s horribly wrong.</p>
<p>In fact, there’s nothing wrong, because it turns out (fortuitously!) that effect sizes in functional neuroimaging studies tend to be huge. It’s not uncommon to see effect sizes around d = 2 (d is a standardized measure of effect size popularized by Cohen; it’s measured in standard deviations, so a d of 2 means the difference in neural activation between two experimental conditions is around 2 standard deviations). Effects that large are unheard of in most other disciplines. Consider that Cohen himself considered anything above d = 0.8 a ‘large’ effect (this is just a heuristic of course—the meaning of ‘large’ differs considerably across research areas!).</p>
<p>A quick power calculation reveals that a study with 12 subjects has essentially 100% power to detect an effect size of 2 at p < .05. Basically, if the population effect really is that big, you’re not going to miss it. In fact, with only 2 subjects, you’d still have an 88% shot of detecting it. This explains why early neuroimaging studies that often had only 3 or 4 subjects were able to obtain replicable results. In the early days, when little was known about the relationship between specific cognitive tasks and neural activity in humans, researchers used very broad experimental task contrasts specifically intended to elicit very large, very obvious changes in activation (e.g., comparing activation during a working memory task to a passive resting state). The effects were (not surprisingly, in hindsight) enormous. As time goes on and our knowledge of the functional neuroanatomy of cognition builds up, hypotheses become more subtle, and effect sizes diminish, requiring larger samples.</p>
<p>Of course, imaging studies usually don’t test for effects at p < .05, for reasons I won’t go into here (mainly the need to correct for multiple comparisons). Still, even at p < .001, a study with 15 subjects has 70% power. That’s not great, but it’s a comparable level to what you’ll find in many behavioral studies. Bump the sample up to 20 subjects, and power is now 92%, which is more than acceptable.</p>
<p>Hopefully, these example make clear the importance of (a) conducting power calculations <em>before</em> starting to collect data, and (b) having some reasonable notion as to what the population effect size might be (e.g., based on related effects that have already been identified). Even if you&#8217;re never going to collect any data yourself, and just want to be an informed consumer of scientific literature, it pays to know something about power. Remember: effect size matters. The fact that a study only has 10 people doesn&#8217;t necessarily mean it&#8217;s too small to provide meaningful data. Conversely, a study can have thousands of subjects and still be underpowered.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.smallgraymatters.com/2006/12/04/a-primer-on-power/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
